Figures
Abstract
The information-theoretic approach can shed light on the role of groups of correlated elements within a network. While there are already established methods for measuring new information, storage and transmission, the definition and application of methods for measuring information change remains an unresolved challenge. The change of information in a network is associated with redundancy and synergy between systems that share information about a target. Redundancy involves shared information about the target that can be retrieved using the individual source systems, while synergy involves information that can only be obtained by sharing the systems. A more refined approach, called partial information decomposition (PID), separates the unique, redundant and synergetic contributions of the shared information. However, these contributions cannot be directly derived from the classical measures of information theory. In this work, we apply PID approach to publicly available microarray gene expression data from 2 different experiments derived from patients affected by HCC and ASD. By comparing sample and gene synergy clusters with classical correlation clusters, we uncover higher order behaviours, such as differential genes and enriched functions closely linked to diseases phenotype, that emerge with this novel approach. These findings and further applications of this approach to gene expression data could shed light on the genetic aspects related to physiological aspects of complex diseases.
Citation: Lacalamita A, Monaco A, Serino G, Marinazzo D, Amoroso N, Bellantuono L, et al. (2025) Unveiling complex patterns: An information-theoretic approach to high-order behaviors in microarray data. PLoS One 20(11): e0336379. https://doi.org/10.1371/journal.pone.0336379
Editor: Y-h. Taguchi, Chuo University, JAPAN
Received: September 4, 2024; Accepted: October 26, 2025; Published: November 13, 2025
Copyright: © 2025 Lacalamita et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: NA, LB, ST, and RB have obtained funding for this work under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.4-Call for tender no. 3138 of 16 December 2021 of the Italian Ministry of University and Research funded by the European Union–NextGenerationEU (award number/project code: CN00000013), and Concession Decree No. 1031 of 17 February 2022 adopted by the Italian Ministry of University and Research (CUP: D93C22000430001), Project title: “National Centre for HPC, Big Data and Quantum Computing”. AM, TM and RB have obtained funding for this work under the project “Genoma mEdiciNa pERsonalizzatA –GENERA”, local project code T3-AN-04 – CUP H93C22000500001, financed under the Health Development and Cohesion Plan 2014-2020, Trajectory 3 “Regenerative, predictive and personalized medicine” - Action line 3.1 “Creation of a precision medicine program for the mapping of the human genome on a national scale”, referred to in the Notice of the Ministry of Health published in the Official Journal no. 46 of 24 February 2021. AM, MLR, TM, EP and RB were supported by the Italian funding within the “Budget MIUR - Dipartimenti di Eccellenza 2023 - 2027” (Law 232, 11 December 2016) - Quantum Sensing and Modelling for One-Health (QuaSiModO), CUP:H97G23000100001. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Through the information-theoretic approach, we can gain insights into the role of correlated element groups in a network. There are already established methods for measuring new information, storage and transmission, but the definition and implementation of methods for measuring information change remains an open challenge. The change of information in a network is related to redundancy and synergy between systems that provide information about a target. Redundancy means that each system independently provides the same information about the target, while synergy means that some information can only be obtained by combining the systems [1]. A classical method called interaction information decomposition (IID) analyzes information modification through the balance between redundancy and synergy [2–4]. However, the IID approach has the limitation that regards redundancy and synergy as mutually exclusive concepts, using a single value to quantify the modification [5,6].
A more refined approach called partial information decomposition (PID) [7] separates the unique, redundant, and synergistic contributions of shared information. However, PID requires new definitions for redundancy, synergy and unique information that cannot be directly derived from the classical measures of information theory.
Consequently, various researchers have proposed alternative measures to define the components of PID, resulting in a variety of definitions [7–11]. The lack of consensus on the desired properties of PID measures is the main reason for this proliferation.
Another challenge is the difficulty of reliably estimating the measures used in IID and PID decompositions. The estimation of probabilities using histogram-based methods is fraught with errors [12–14]. Although there are techniques that can improve the estimation of information storage and [15–17], their effectiveness in changing information has not yet been proven [18–21].
Several works used PID as a novel tool to investigate biological and physiological aspects [22–27]; for example, Wibral et al. [28] investigated the developmental course of information modification in a culture of neurons in vitro. Their findings showed that information modification increased with maturation but decreased as redundant information became dominant between neurons. This suggests that the neural system initially developed complex processing abilities, but ultimately exhibited highly similar information processing between neurons, possibly due to the absence of external inputs. In conclusion, they emphasised the significant potential of PID and information modification analysis for a better understanding of neural systems. Ince et al. [29] introduced a new estimation method that combined copula-based statistical theory with a closed-form solution for the entropy of Gaussian variables. This method creates a comprehensive, efficient, flexible, and robust multivariate statistical framework. It provides effect sizes on a consistent scale, handles discrete and continuous variables (both unidimensional and multidimensional), and allows direct comparisons of behavioral and brain response representations across different recording modalities. They demonstrated the effectiveness of this estimation as a statistical test in neuroimaging, that accounts for both discrete stimulus categories and continuous stimulus features. Furthermore, Park et al. [30] analyzed the representational interactions between dynamic audio and visual speech signals and found that different brain regions exhibit different types of representational interactions. In particular, using a novel information-theoretic measure, they discovered that redundant encoding in the left posterior superior temporal gyrus/sulcus and synergistic encoding in the left motor cortex showed a hight correlation with speech comprehension performance.
Our work aims to apply partial information decomposition to biological data, in particular to gene expression from microarray experiments performed in our previous works [31,32] using complex networks and machine learning. Based on previous results, we used PID to investigate different gene communities in more detail and to explore the presence of crucial hidden information that is ignored by classical approaches. To our knowledge, our study is the first application of this novel tool on microarray data. We studied two different use cases: hepatocellular carcinoma (HCC) and autism spectrum disorder (ASD) to assess the the potential generality of this approach regardless of the chosen disease. In addition, we compared synergistic gene clusters obtained by PID with bivariate clusters achieved using mutual information. Finally, we applied biological tools such as differential gene expression (DGE) analysis and enrichment analysis to assess the differences between the found clusters. We would like to emphasize that these methods are not the final goal of this work, but are only used as comparative tools.
This paper is organized into five different sections: in the Methods Section, we give an overview of the analyzed data and the applied methodologies. In the Results Section we show the results obtained with the application of PID methods to two different use cases HCC and ASD. In the Discussion Section, we discuss our results and in the Conclusion Section summarize our findings.
Materials and methods
As described in the introduction section, we analyzed microarray gene expressions from 2 different experiments studied in two previous works [31,32] by means of complex network and machine learning methods. The analysis implemented in this paper starts from the results of these two previous studies. Our pipeline is shown in Fig 1. Panel A refers to the previous analysis, where we selected the most biologically informative communities. Then, panel B describes the new analysis in which we investigated the existence of biological important sub-communities of genes, building the adjacency matrix through two different metrics: mutual information and synergy and by using Partitioning Around Medoids (PAM) as clustering algorithm.
In panel A revisits our earlier analysis, where we identified the community with the most biological significance. Panel B, however, presents a new analysis in which we determined the biological importance of the gene community through two different grouping metrics based on mutual information and synergy. For this analysis, we employed the Partitioning Around Medoids (PAM) algorithm as our clustering technique.
Data description
Use case 1: HCC.
The microarray dataset GSE102079 and GSE54236 were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/). The GSE102079 dataset [33,34] contains gene expression data extracted from liver tissue of 152 patients, specifically containing: 152 tumor and 91 adjacent liver tissues from HCC patients and 14 adjacent liver tissues obtained from patients with metastasis of colorectal cancer who had not received chemotherapy. The GSE54236 dataset [35–38] comprises gene expression data from 156 samples, including 78 hepatocellular carcinoma (HCC) tumor tissues and 78 corresponding adjacent non-tumor tissues, analyzed using the GPL6480 Agilent-014850 Whole Human Genome Microarray. This dataset served as an independent test set. Raw data were normalized using robust multiarray analysis (RMA) [39]; this method implements a background correction of the original data; then, a log2 transformation and finally a quantile normalization.
Use case 2: ASD.
The dataset used is GSE28475 downloaded also from GEO; it consists of 104 samples divided in two classes: 33 ASD and 71 control samples [40,41]. We implemented a preprocessing procedure to minimize batch effects using the ComBat function from package sva version 3.50 [42]. Then, data were log2 transformed and quantile normalized through the lumiN function from package lumi version 2.54 [43].
Mutual information
Entropy, the fundamental concept in information theory, characterizes the uncertainty or variability of a random variable [44–46] and thus is a useful measure of signal complexity [47]. Nevertheless, an engaging concept is the Mutual Information (MI). The MI between two variables quantifies their dependency and relationship using their entropy differences. Mutual information is a metric of the statistical dependency between two random variables, X and Y, and does not assume any particular distribution or type of interaction among the variables [48,49]. This freedom from restrictive assumptions makes MI very capable of finding nonlinear as well as nonmonotonic relationships. It can be represented using three mathematically equivalent formulations concerning entropy differences, each providing a different perspective on the information shared between the variables.
Where represents the conditional entropy and H(X,Y) is the entropy of the joint distribution of X and Y. MI can be interpreted as a standard statistical hypothesis test of independence, in many ways analogous to a t-test or correlation test [50]. When regarded this way, absolute unbiasedness is less crucial; instead, emphasis lies on statistical significance evaluation and overcoming issues such as multiple comparisons [51]. MI is special in its advantages of sensitivity, robustness, and additivity. It does this by offering a general framework for analyzing discrete, continuous, and multi-dimensional variables using easily comparable measures of effect size on meaningful scales [52]. Despite all these beneficial properties, accurately estimating MI from bound experimental data is a quite difficult problem [53].
One of the ways to estimate mutual information is through the Gaussian Copula Mutual Information (GCMI) [29], which makes use of a copula that is a statistical construct that captures the dependency between two random variables independently of their marginal distributions [54] (https://github.com/robince/gcmi/tree/master/matlab).
Partial information decomposition
In a situation where there are multiple source elements synapsing onto a single target element, the way information about the target is distributed amongst different combinations of source elements can be quite complex. To begin to examine this, consider a very simple “complex system" composed of two parent elements, X1 and X2, both synapsing onto a target Y. As already mentioned, mutual information captures the total information provided by both parents about Y. However, this overall measure does not explain how the information is divided among X1 and X2 individually.
In order to decompose the joint mutual information into its basic elements in order to get a detailed insight into how information is distributed. This will enable us to identify and explain all the ways in which information might be shared, uniquely contributed, or redundantly represented among the source elements. This concept forms the basis of Partial Information Decomposition (PID) [7].
Fig 2 shows how the different parts of partial information (redundant, unique, and synergistic) are represented in a Venn diagram in terms of joint and marginal mutual information elements for two source variables (X1 and X2) and a target variable (Y) [7].
A notable difference between the marginal mutual information and the unique information is highlighted: the marginal mutual information overlaps, with each counting the redundant (shared) information towards its own marginal mutual information.
The discrepancy lies in the synergistic information, which cannot be attributed solely to either marginal mutual information.
The relationship between partial information terms and mutual information terms is delineated as follows:
It’s necessary to reintroduce a redundancy term because it is “double counted" when subtracting off the marginal mutual informations.
The information can then be decomposed into specific components. represents the information about Y that is redundantly available in both X1 and X2 (it can be observed through either X1 or X2).
denotes the information about Y that is uniquely provided by one of the sources, such as X1, and is only observable through that specific source; similarly, the unique information
is defined analogously. Finally
is the information about Y that emerges from the joint states of X1 and X2 and cannot be attributed to either source individually.
An issue with the Partial Information Decomposition (PID) lies in obtaining its constituent measures through classic information theory, necessitating an additional ingredient to provide an unambiguous definition of ,
,
,
. Different redundancy and synergy definitions have been suggested to complete the PID definition [8,9,11]; the so-called minimum MI (MMI) PID is referenced [18], where Redundancy is defined as the minimum of the information provided by each individual source to the target. This choice ensures that Redundancy remains independent of the correlation between the source processes. Furthermore, under a joint Gaussian distribution of observed processes, all previously proposed PID formulations reduce to the MMI PID.
Partitioning around medoids
The algorithm is designed to identify a series of points, called medoids, located in the centre of clusters [55]. These objects, called medoids, are summarised in a set S of selected elements. Let O represent the entire set of values, then is the set of unselected values. The goal of the algorithm is to minimise the average dissimilarity between each point and the nearest selected point. In other words, it aims to minimise the total dissimilarity between the objects and the nearest selected medoid. Partitioning Around Medoids (PAM) uses a greedy search strategy, which is faster than an exhaustive search, even if it does not always find the optimal solution. The algorithm consists of two steps: the BUILD step, in which it selects k medoids from the n data points in a greedy manner to minimise the cost that each data point associates with its nearest medoid; the SWAP step, in which the potential benefit in terms of cost changes due to the swap between a medoid and a non-medoid points is evaluated.
Experimental procedure
Our experimental approach is based on the findings of two previous works [31,32]. In the two previous analyses we analyzed two different diseases, Hepatocellular Carcinoma (HCC) and Autism Spectrum Disorder (ASD), with a common pipeline based on complex networks and machine learning techniques. Specifically through a hierarchical community detection approach based on the Leiden algorithm [56] we identified stable communities within the dataset. By applying a machine learning approach combining Boruta [57], Random Forest [58] and a 5-fold Cross Validation, we identified the communities that best discriminated healthy from diseased subjects. Then through eXplainable Artificial Intelligence techniques and functional enrichment we biologically validated these communities.
In particular in this work, we decided to further investigate two communities found in each previous analysis. We chose the community most interesting for biological aspect and the community without biological meaning but with the highest classification performance to discriminate sick from healthy patients. After selecting the gene communities, we applied the partial information decomposition explained in Sect , in two different configurations: in the first case we considered the subjects as nodes and in the second the genes as nodes of a network. If you take a closer look at the Venn diagram in Fig 2, the nodes become the subjects of our measures of mutual information (X1, X2, Y). To evaluate the higher order metrics, such as synergy, these measurements are made on each possible triple and the corresponding duets to fulfil the four equations in 4. Specifically, PID components (synergy, redundancy, and unique information) were calculated for each possible triplet combination of nodes (samples or genes) within a community, thus capturing higher-order interactions that traditional pairwise methods do not account for. In this way, for example, we obtained a three-dimensional matrix for synergy, which was subsequently averaged over the target size to obtain a dissimilarity matrix. We then calculated the silhouette to determine the optimal number of clusters, which we finally found using the PAM algorithm. The silhouette coefficient was evaluated for clustering solutions ranging from 2 to 10 clusters, selecting the number that maximized the silhouette score. We repeated the same procedure with the bivariate mutual information matrix and evaluated the differences inside the different groupings. For each subcommunity obtained by clustering, we assessed the accuracy for the independent dataset (where possible) and the enriched biological functions. Finally, to test the robustness of our analysis, we performed random sampling to understand whether accuracy and enrichment levels depend on Synergy or not. More specifically, we performed a bootstrap sample of genes for each considered community and then tested these random sub-communities as previously described to assess accuracy and statistically significant biological functions in independent test samples.
Biological tool
To assess the biological significance of our results, we used complementary analyses: Differential Gene Expression (DGE) analysis and Pathways Analysis. For DGE analysis, we used the LIMMA R package [59], which applies linear modeling combined with empirical Bayes methods to detect statistically significant differences in gene expression levels between the identified clusters. Specifically, LIMMA was used exclusively to evaluate whether subsampling subjects through PID-derived groups could reveal additional differentially expressed genes compared to the entire original community. For Pathways Analysis, we performed Over-Representation Analysis (ORA) to identify functionally enriched biological pathways within predefined gene subsets identified through PID clustering. ORA is particularly suitable for assessing enrichment in gene subsets, as it does not require gene-ranking scores and evaluates the statistical significance of overlaps between gene clusters and predefined biological pathways. In detail, for the HCC use case, we conducted ORA using the Molecular Signature Database (MSigDB) via the GSEA software, considering hallmark gene sets, canonical pathways, gene ontology categories, and chemical and genetic perturbations [60]. Functions with a false discovery rate (FDR) below 0.05 were considered significantly enriched. For the ASD use case, we performed ORA using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [61–63] using the clusterProfiler package R [64]. Functions with a Bonferroni corrected p-value lower than 0.05 were considered significantly enriched. The biological validation of PID-derived clusters was achieved through this combined approach of identifying differential gene expressions and subsequently assessing pathway enrichments, explicitly demonstrating the biological significance of higher-order interactions identified through PID. The selection of specific enrichment tools for each use case (MSigDB using GSEA software for HCC and KEGG via clusterProfiler for ASD) was consistent with the methodologies previously applied in our earlier studies [31,32], ensuring comparability between analyses.
Results
Use case 1: HCC
As mentioned above, our work starts from a previous analysis [31] in which we first implemented a community detection procedure using the Leiden algorithm to find stable gene communities within the gene co-expression network. Then, we applied an additional feature selection method based on the Boruta algorithm to each community found. In our present work, we focused on two different communities: community 29 the most biologically relevant and community 32 with highest HCC-controls classification accuracy on the test sample.
Community 29.
Community 29 was composed of the gene expression of 51 genes belonging to 140 subjects. For this community, we performed two distinct analyses considering as nodes of a network first the samples and then the genes.
On the first analysis, with the 140 samples as the nodes, we implemented the Partial Information Decomposition based on all possible triplets, evaluating the higher order metrics. Then we compared the synergy and mutual information sample clusters obtained. In order to find the optimal number of clusters, we evaluated the silhouette metrics. Table 1 shows the label distribution within both synergy and mutual information clusters found by means of PAM algorithm.
Subsequently, to evaluate the biological meaning of our findings, we implemented a Differential Gene Expression (DGE) analysis. First of all we studied the Differential Genes (DGs) of the whole community and we considered a gene to be statistically differential expressed if the absolute log Fold Change value is greater than 1.5 and the adjusted P value is statistically significant at 1%. We adopted this logFC threshold in order to focus on genes with the largest and most biologically meaningful expression differences, providing a conservative filter that prioritizes robust disease-related signals over subtle but potentially less interpretable changes. Furthermore, we implemented this analysis to the clusters that combined both classes balance and cardinality. In Table 2 are displayed the number of DGs for all possible grouping indicating the number of information gained (+) or lost (-) compared to the whole community (the complete lists of genes are shown in the S1 Appendix of the Supporting Information).
As we can see, the synergy seems to maximize the variability of the samples: the SI clusters presents a mean number of DGs grater than the whole community and compared to two MI clusters. In particular in the SI clusters we found 2 genes linked to HCC biological functions: TCF21 [65] and RBP1 [66]. Both genes are connected with other genes notably deregulated in cancer as ANGPTL1, ADAMTSL2, PELI2 and EPCAM. To further validate these findings, we derived empirical p-values by comparing the observed gene-level statistics within synergy clusters against null distributions from 1,000 matched random subject subsets (see Supplementary S1 Appendix). Among the candidate DEGs for both Synergy and MI, only TCF21 [65] remained significant under this more stringent test.
In the second part of our analysis, we considered the 51 genes as nodes of a network and we implemented the same pipeline in order to study the behaviors of the synergy gene communities with respect to the mutual information ones. The results of our analysis are shown in Table 3.
The lists of the new statistically significant enriched functions, gained over the entire community for MI and synergy clusters are shown in S1 and S2 Tables respectively of the Supporting Information. For the robustness test we bootstrapped the original community 10 times. No bootstrap resampling contains the enriched biological functions found with synergy. We reported these functions in S1 ppendix.
Finally, we built a classifier, based on Random Forest as in the original work [31], to test on an independent dataset the gene communities found with synergy and mutual information approaches. The classification performance among HCC and controls tissues are shown in Table 3: in term of accuracy the synergy cluster 2, shown in Fig 3, outperformed the two MI clusters found and the whole community with a reduction of approximately 60% of genes.
Community 32.
Community 32 was composed of 41 genes belonging to 140 individuals. As for the previous community, we applied PID using both a complex network framework with subjects as nodes and a complex network framework with genes as nodes. The results of PAM clustering application are shown in Table 4.
In addition, we applied the DGE analysis to the three synergy clusters and to the first MI cluster, which appears to be much more balanced in terms of class than the second. Table 5 shows the number of DGs for the whole gene community and for each cluster found. Synergy clusters 2 and 3 have one DG (NAGS) more than the whole community like MI cluster 1 (CYP7A1). These two genes appear not associated with the HCC phenotype and do not remained significant comparing them against null distributions (the complete lists of genes with their corresponding empirical p values are shown in the S1 Appendix of the Supporting Information).
Using genes as nodes in the complex network model and applying the PAM algorithm, we found 2 synergy clusters, shown in Fig 4, with a similar cardinality and 2 MI clusters much more unbalanced (see Table 6). As we can see in S3 Table of the Supporting Information, the SI clusters have new statistically significant functions associated with the HCC phenotype. Specifically, some of these biological functions were linked to hepatoblastoma and to the onset of liver cancer. We did not report any new novel biological processes for the MI clusters compared to the original community. As with community 29, we implemented a random forest model to assess the discriminatory power of these clusters in classifying HCC and healthy controls on the independent dataset. As shown in Table 6, a classification accuracy of 78.26% and 77.02% is achieved for the two synergy groups compared to 82.61% for the whole community; a slightly lower performance but with half the features used.
Use case 2: ASD
As for the previous use case, we started from the results of a previous work where through a complex network and machine learning combined approach we selected some interesting gene communities respect to Autism phenotype. [32]. We decided to further investigate two communities: community 50 statistically and biologically relevant and community 78 that presented the highest performance in discriminating between autism and control subjects.
Community 50.
Community 50 comprised gene expressions of 44 genes belonging to 104 subjects. As for the HCC use case we applied two different analyses in which we built two complex networks considering as nodes first the samples and then the genes. First, we developed a complex network framework with 104 nodes (the number of patients) and implemented Partial Information Decomposition based on all possible triplets, assessing the higher order metrics. Then we compared the synergy and mutual information of the pattern clusters. We choose the best number of clusters by evaluating the silhouette coefficient.
Table 7 shows the label distribution of both synergy and mutual information clusters found.
We implemented differential gene expression analysis for the biological evaluation of the detected clusters. For what concern the DGE analysis, no statistically significant differences are found with linear model from limma package [59], therefore we implemented a Kruskal-Wallis test [67] in order to compare the ASD gene expression distributions with the Control ones and to detect possible significant differences. We reported in Table 8 the number of genes with a statistically significant p-value at 1 % (the complete lists of genes are shown in the S2 Appendix of the Supporting Information).
In the synergy clusters we found 8 new DGs respect to the whole community. Instead 4 are the new DGs presented in the MI clusters:
- Unique Synergy 8 DGs: SLC26A11, PPP1R9B [68], GPR150, PBX1 [69], RXRG [70], LOC286526, ZNF706, RUFY3;
- Unique MI 4 DGs: MGC40222, WAC [71], PRKCBP1, LOC286526.
Also in this case, to further validate these findings, we derived empirical p-values by comparing the observed gene-level statistics within synergy clusters against null distributions from 1,000 matched random subject subsets (see Supplementary S2 Appendix). Among the candidate DEGs, for both Synergy and MI, PPP1R9B [68], LOC286526, ZNF706 and FOXE3 remained significant under this more stringent test.
By using genes as nodes after applying Silhouette and PAM algorithm, we found 2 synergy clusters and 2 MI clusters as shown in Table 9. In synergy cluster 2, shown in Fig 5, we detected four different statistically significantly enriched functions as shown in S4 Table of the Supporting Information; no significantly enriched functions were found in MI clustering. For the robustness test we bootstrapped the original community 10 times. No bootstrap resampling contains enriched biological functions. After finding the biological content of synergy cluster 2, we could not test its classification performance in ASD control subjects because the overlap between the test sample and the cluster consisted of 3 genes only.
Community 78.
In the network configuration with samples as nodes we found 2 synergy clusters and 2 MI clusters (see Table 10). The number of differentially expressed gene is summarized in Table 11. We excluded MI cluster 2 because presented a very unbalanced configuration between ASD and control subjects (the complete lists of genes are shown in the S2 Appendix of the Supporting information).
The synergy and MI approaches highlighted 2 and 4 new DGs genes, compared to the whole community, respectively.
- Unique Synergy 2 DGs: CORO7, EVX2;
- Unique MI 4 DGs: OR52N2, MOGAT2, GP1BA, IL15RA.
As far as we know, these genes don’t present a direct connection with ASD phenotype and only two genes (OR52N2 and MOGAT2) remain significant comparing them against null distributions (see S2 Appendix of the Supporting Information). By using genes as nodes after applying Silhouette and PAM algorithm, we found 4 synergy clusters and 2 MI clusters as shown in Table 12. In synergy cluster 3, shown in Fig 6, we detected five different statistically significantly enriched functions reported in S5 Table of the Supporting Information; we found nothing significant in the two MI clusters. For the robustness test we bootstrapped the original community 10 times. No bootstrap resampling contains enriched biological functions. After finding the biological content of synergy cluster 2, we could not test its classification performance in ASD control subjects because the overlap between the test sample and the cluster consisted of 4 genes only.
Discussion
This work studied biological microarray data through partial information decomposition. We analyzed the information modification of two different use cases: hepatocellular carcinoma (HCC) and autism spectrum disorder (ASD). We started from the most interesting results of two our previous works to find gene communities with a biological meaning linked to the phenotype of two complex diseases. We used a clustering approach based on Partitioning Around Medoids (PAM) algorithm in which the adjacency matrix has been computed through two different metrics: mutual information and synergy. In both cases, we applied two different network configurations: samples as nodes and genes as nodes. Our results highlighted that, in a totally data driven process without any assumptions on the data composition and data characterization, grouping samples or genes detected considering a multivariate metric (synergy) respect to a bivariate one (Mutual Information) brought out additional biological patterns. We emphasize that PID is not intended as a feature selection method; our focus was instead on revealing higher-order interactions that might uncover hidden biological insights. Classification performance was assessed only as a validation step to confirm the informational value of PID-derived substructures.
Analyzing the gene community 29 of HCC use case we found, among the differentially expressed genes of the synergy cluster, two genes connected to the tumour processes. TCF21, a member of the class II bHLH transcription factor superfamily, has been shown to undergo abnormal methylation and is often inactivated in human cancers [65], and RBP1 that seems to be involved in different types of cancers including HCC [66]. These genes were also connected with other genes notably deregulated in cancer as ANGPTL1, ADAMTSL2, PELI2 and EPCAM. Moreover, enrichment analysis of genes in synergy clusters of community 29 indicated that these genes resulted silenced in adult cancers through DNA methylation mechanism. Several studies have reported that this mechanism contributed to the onset and progression of cancer. In fact, DNA methylation is a genetic modification that significantly influences cancer initiation and progression by silencing tumor-suppressor genes through promoter hypermethylation and activating oncogenes through promoter hypomethylation [72,73]. Also in HCC, numerous studies have demonstrated the altered methylation status of genes in liver tissues of patients [74,75]. Furthermore, functions related to the -catenin signaling are enriched for the synergy cluster, this is one of the pathway frequentely activated for HCC patients [76–78]; this pathway is also related to the differential gene RPB1 gained thanks to synergy grouping in the samples as nodes analysis [66].
Enrichment analysis of genes in the synergy clusters of community 32 revealed specific biological functions linked to the onset of pediatric and adult liver cancer. In fact, the first significant altered hallmark was hepatoblastoma [79] that was the most common pediatric liver cancer followed by other genes signature altered in hepatocellular carcinoma (HCC) such as SU_LIVER, LEE_LIVER_CANCER_E2F1_DN, LEE_LIVER_CANCER_MYC_E2F1_DN
[80,81], LEE_LIVER_CANCER_ACOX1_DN [82,83]. Notably, in our previous work, the enrichment analysis of the whole gene community detected no biological function associated to HCC despite its good statistical performance.
For both communities 32 and 29, no synergy and MI cluster outperforms the classification performance of the entire communities in the test dataset. Interestingly, synergy cluster 2 of community 32 matches the performance of the whole community for HCC-control subjects classification, but with a reduction of about 60% of the genes, emphasising how the synergy-based approach improves the discriminative power of the community by eliminating uninformative genes.
For ASD use case, in particular analyzing the synergy clusters of community 50, we detected as DG the gene RXRG linked to proper molecular patterns in the prefrontal and motor areas and also associated with the development of the prefrontal cortex-medial dorsal thalamus connection. RXRG is believed to be altered in ASD [69]. An other differentially expressed gene in synergy clusters is PBX1. In ASD probands, a de novo likely gene disruptive (LGD) variant and two de novo missense variants in the PBX1 gene were identified [84,85].
Additionally, two de novo loss-of-function variants and two rare, potentially damaging missense variants in the PBX1 gene were reported in ASD probands from the Autism Sequencing Consortium and the SPARK cohort [86,87].
As for the analysis with genes as nodes, for both original communities (namely communities 50 and 78) we found a synergy cluster enriched with a function statistically associated with the ASD phenotype. Conversely, all MI clusters found did not show any statistically significant biological function. For community 50, 4 new functions emerged from synergy cluster 2, which are listed in S4 Table. Two of these functions are related to the regulation of cortisol which is related to any kind of physical and mental stress. It is well the correlation with autism [88–93]. In particular, cortisol levels in the hypothalamic–pituitary–adrenal (HPA) axis were found to be higher in subjects with ASD than in healthy subjects [94,95]. The third function found, in order of statistical significance, concerns transcriptional dysregulation in cancer patients. Premature mortality in ASD seem due to various factors such as epilepsy, diabetes, cancer [96,97] and gastrointestinal diseases [98,99]. In particular, recent studies have found an overlap between biomarkers for ASD and various kind of cancers such as brain, thyroid and kidney [100–102]. The last significant function found concerns the mitogen-activated protein kinase (MAPK) signalling pathway, where deviations from the regular control were detected with several neurodevelopmental disorders including ASD [103–106].
For community 78, which had no biological significance, we discovered a synergy cluster (namely synergy cluster 3) with 5 enriched functions using the combination of PID and PAM, as listed in S5 Table. All these functions belong to a macrodomain related to gastrointestinal disorders (GID) [107–111]. Children with ASD very often develop critical conditions in the stomach that alter the intestinal epithelium and the composition of the gut microbiome (GM) [112–114]. There is growing evidence that the gut maintains bidirectional communication with the brain, creating the so-called microbiome-gut-brain axis, which is crucial for maintaining stable brain and gut function. Several studies have shown a link between GM problems and ASD; in particular, fats appear to play a special role in the diet of a child with ASD, as they can influence the GM community and the inflammatory response [115,116]. Another enriched function concerns the absorption of minerals: these are essential for the normal structure and key functions of the central nervous system and promote cell differentiation, development and migration [117,118]. Many studies showed links between vitamins, neurodevelopment and cognitive function by highlighting deficits in children with autism, [119–121]. In addition, several studies have shown that differential metabolites in plasma can be used to distinguish cases of ASD from control subjects [122,123]: from these studies it emerged that a particular class of metabolites is enriched in studies of individuals with autism, namely glycerolipids [124]. Moreover, new evidence suggests that reduced mitophagy may improve Alzheimer’s-related pathological features and behaviors. It is present both in postmortem brain samples from Alzheimer’s patients and in animal models of the disease [125,126]. The involvement of neuronal mitophagy in the development of ASD has recently become a focal point of research [127,128]. Lastly, different studies have revealed that people with ASD who follow specific protein-restricted diets, such as wheat and dairy-free diets, have significantly lower intestinal permeability compared to those on unrestricted diets [129–134].
The information-theoretic and in particular partial information decomposition approach if applied to microarray data seems to be able to understand the mechanisms of complex diseases and to find potential targets for a future therapy. The superiority of PID in this study is highlighted by its unique ability to reveal internal structures within previously identified gene communities, which classical gene enrichment methods alone did not detect. Clearly, a more in-depth clinical validation is necessary to clarify how these genes are implicated in the biological processes linked to HCC and ASD.
However, our work has some limitations. First, PID is computationally complex since the measurements have to be performed for each triplet combination and the computation times are high, which limits the size of the network to be analyzed. Furthermore, this analysis can also be performed on higher orders, such as quadruplets, but this further increases the required computational power. Clearly, more in-depth clinical validation is needed to understand how these genes are involved in the biological processes associated with ASD and HCC, and application to other diseases may improve the validity of this analysis.
Conclusion
In this article, we improved an information-theoretic method based on measuring the change of information in a network through synergy and redundancy to analyze gene communities associated with HCC and ASD. Based on the findings from 2 previous papers, we implemented in our article partial information decomposition (PID) and partitioning around medoids using microarray gene expression profiles from 2 different experiments. As far as we know, this is the first application of the PID method to microarray data. We found higher order behaviours, expressed as differential genes and enriched functions related to the progression of HCC and ASD. Our results demonstrate both the power of these techniques in the study of gene expression data and in the discovery of potential therapeutic targets for the treatment of complex diseases, although this also requires further investigation.
Supporting information
S1 Appendix. HCC: Additional information on the HCC analysis results.
https://doi.org/10.1371/journal.pone.0336379.s001
(PDF)
S2 Appendix. ASD: Additional information on the ASD analysis results.
https://doi.org/10.1371/journal.pone.0336379.s002
(PDF)
S1 Table. HCC: Comm 29 enrichment analysis on MI clusters.
https://doi.org/10.1371/journal.pone.0336379.s003
(PDF)
S2 Table. HCC: Comm 29 enrichment analysis on synergy clusters.
https://doi.org/10.1371/journal.pone.0336379.s004
(PDF)
S3 Table. HCC: Comm 32 enrichment analysis on synergy clusters.
https://doi.org/10.1371/journal.pone.0336379.s005
(PDF)
S4 Table. ASD: Comm 50 enrichment analysis on synergy clusters.
https://doi.org/10.1371/journal.pone.0336379.s006
(PDF)
S5 Table. ASD: Comm 78 enrichment analysis on synergy clusters.
https://doi.org/10.1371/journal.pone.0336379.s007
(PDF)
S1 Fig. HCC: Comm 29 accuracy boxplot of boostrap samples.
https://doi.org/10.1371/journal.pone.0336379.s008
(PNG)
S2 Fig. HCC: Comm 32 accuracy boxplot of boostrap samples.
https://doi.org/10.1371/journal.pone.0336379.s009
(PNG)
References
- 1. Schneidman E, Bialek W, Berry MJ 2nd. Synergy, redundancy, and independence in population codes. J Neurosci. 2003;23(37):11539–53. pmid:14684857
- 2. Stramaglia S, Wu G-R, Pellicoro M, Marinazzo D. Expanding the transfer entropy to identify information circuits in complex systems. Phys Rev E Stat Nonlin Soft Matter Phys. 2012;86(6 Pt 2):066211. pmid:23368028
- 3. Stramaglia S, M Cortes J, Marinazzo D. Synergy and redundancy in the Granger causal analysis of dynamical networks. New J Phys. 2014;16(10):105003.
- 4. Stramaglia S, Angelini L, Wu G, Cortes JM, Faes L, Marinazzo D. Synergetic and redundant information flow detected by unnormalized granger causality: Application to resting state fMRI. IEEE Trans Biomed Eng. 2016;63(12):2518–24. pmid:27875123
- 5. McGill W. Multivariate information transmission. Trans IRE Prof Group Inf Theory. 1954;4(4):93–111.
- 6.
Bell AJ. The co-information lattice. In: Proceedings of the fifth international workshop on independent component analysis and blind signal separation: ICA; 2003.
- 7.
Williams PL, Beer RD. Nonnegative decomposition of multivariate information; 2010. https://arxiv.org/abs/1004.2515
- 8. Harder M, Salge C, Polani D. Bivariate measure of redundant information. Phys Rev E Stat Nonlin Soft Matter Phys. 2013;87(1):012130. pmid:23410306
- 9. Griffith V, Chong E, James R, Ellison C, Crutchfield J. Intersection information based on common randomness. Entropy. 2014;16(4):1985–2000.
- 10. Quax R, Har-Shemesh O, Sloot P. Quantifying synergistic information using intermediate stochastic variables. Entropy. 2017;19(2):85.
- 11. Bertschinger N, Rauh J, Olbrich E, Jost J, Ay N. Quantifying unique information. entropy. 2014;16(4):2161–83.
- 12. Panzeri S, Senatore R, Montemurro MA, Petersen RS. Correcting for the sampling bias problem in spike train information measures. J Neurophysiol. 2007;98(3):1064–72. pmid:17615128
- 13.
Faes L, Porta A. Conditional entropy-based evaluation of information dynamics in physiological systems. Directed information measures in neuroscience. Springer; 2014. p. 61–86.
- 14. Kozachenko LF, Leonenko NN. Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii. 1987;23(2):9–16.
- 15. Vlachos I, Kugiumtzis D. Nonuniform state-space reconstruction and coupling detection. Phys Rev E Stat Nonlin Soft Matter Phys. 2010;82(1 Pt 2):016207. pmid:20866707
- 16. Marinazzo D, Pellicoro M, Stramaglia S. Causal information approach to partial conditioning in multivariate data sets. Comput Math Methods Med. 2012;2012:303601. pmid:22675400
- 17. Faes L, Kugiumtzis D, Nollo G, Jurysta F, Marinazzo D. Estimating the decomposition of predictive information in multivariate systems. Phys Rev E Stat Nonlin Soft Matter Phys. 2015;91(3):032904. pmid:25871169
- 18. Barrett AB. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys Rev E Stat Nonlin Soft Matter Phys. 2015;91(5):052802. pmid:26066207
- 19. Faes L, Porta A, Nollo G, Javorka M. Information decomposition in multivariate systems: Definitions, implementation and application to cardiovascular networks. Entropy. 2016;19(1):5.
- 20. Porta A, Bari V, De Maria B, Takahashi ACM, Guzzetti S, Colombo R, et al. Quantifying net synergy/redundancy of spontaneous variability regulation via predictability and transfer entropy decomposition frameworks. IEEE Trans Biomed Eng. 2017;64(11):2628–38. pmid:28103546
- 21. Barrett AB, Barnett L, Seth AK. Multivariate Granger causality and generalized variance. Phys Rev E Stat Nonlin Soft Matter Phys. 2010;81(4 Pt 1):041907. pmid:20481753
- 22.
Finn C, Lizier JT. Quantifying information modification in cellular automata using pointwise partial information decomposition. In: Artificial life conference proceedings. MIT Press One Rogers Street, Cambridge, MA 0214 2-1209, USA journals-info ... ; 2018. p. 386–7.
- 23. Timme N, Alford W, Flecker B, Beggs JM. Synergy, redundancy, and multivariate information measures: An experimentalist’s perspective. J Comput Neurosci. 2014;36(2):119–40. pmid:23820856
- 24. Sherrill SP, Timme NM, Beggs JM, Newman EL. Partial information decomposition reveals that synergistic neural integration is greater downstream of recurrent information flow in organotypic cortical cultures. PLoS Comput Biol. 2021;17(7):e1009196. pmid:34252081
- 25. Pinto H, Pernice R, Eduarda Silva M, Javorka M, Faes L, Rocha AP. Multiscale partial information decomposition of dynamic processes with short and long-range correlations: Theory and application to cardiovascular control. Physiol Meas. 2022;43(8):10.1088/1361-6579/ac826c. pmid:35853449
- 26. Wibral M, Lizier JT, Priesemann V. Bits from brains for biologically inspired computing. Front Robot AI. 2015;2.
- 27. Wibral M, Priesemann V, Kay JW, Lizier JT, Phillips WA. Partial information decomposition as a unified approach to the specification of neural goal functions. Brain Cogn. 2017;112:25–38. pmid:26475739
- 28. Wibral M, Finn C, Wollstadt P, Lizier J, Priesemann V. Quantifying information modification in developing neural networks via partial information decomposition. Entropy. 2017;19(9):494.
- 29. Ince RAA, Giordano BL, Kayser C, Rousselet GA, Gross J, Schyns PG. A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula. Hum Brain Mapp. 2017;38(3):1541–73. pmid:27860095
- 30. Park H, Ince RAA, Schyns PG, Thut G, Gross J. Representational interactions during audiovisual speech entrainment: Redundancy in left posterior superior temporal gyrus and synergy in left motor cortex. PLoS Biol. 2018;16(8):e2006558. pmid:30080855
- 31. Lacalamita A, Serino G, Pantaleo E, Monaco A, Amoroso N, Bellantuono L, et al. Artificial intelligence and complex network approaches reveal potential gene biomarkers for hepatocellular carcinoma. Int J Mol Sci. 2023;24(20):15286. pmid:37894965
- 32.
Lacalamita A, Pantaleo E, Monaco A, Bellantuono L, Fania A, La Rocca M, et al. A joint complex network and machine learning approach for the identification of discriminative gene communities in autistic brain. Submitted to Plos One. 2024.
- 33. Chiyonobu N, Shimada S, Akiyama Y, Mogushi K, Itoh M, Akahoshi K, et al. Fatty acid binding protein 4 (FABP4) overexpression in intratumoral hepatic stellate cells within hepatocellular carcinoma with metabolic risk factors. Am J Pathol. 2018;188(5):1213–24. pmid:29454748
- 34. Hatano M, Akiyama Y, Shimada S, Yagi K, Akahoshi K, Itoh M, et al. Loss of KDM6B epigenetically confers resistance to lipotoxicity in nonalcoholic fatty liver disease-related HCC. Hepatol Commun. 2023;7(10):e0277. pmid:37782459
- 35. Villa E, Critelli R, Lei B, Marzocchi G, Cammà C, Giannelli G, et al. Neoangiogenesis-related genes are hallmarks of fast-growing hepatocellular carcinomas and worst survival. Results from a prospective study. Gut. 2016;65(5):861–9. pmid:25666192
- 36. Zubiete-Franco I, García-Rodríguez JL, Lopitz-Otsoa F, Serrano-Macia M, Simon J, Fernández-Tussy P, et al. SUMOylation regulates LKB1 localization and its oncogenic activity in liver cancer. EBioMedicine. 2019;40:406–21. pmid:30594553
- 37. Dituri F, Scialpi R, Schmidt TA, Frusciante M, Mancarella S, Lupo LG, et al. Proteoglycan-4 is correlated with longer survival in HCC patients and enhances sorafenib and regorafenib effectiveness via CD44 in vitro. Cell Death Dis. 2020;11(11):984. pmid:33199679
- 38. Critelli RM, Milosa F, Romanzi A, Lasagni S, Marcelli G, Di Marco L, et al. Upregulation of the oestrogen target gene SIX1 is associated with higher growth speed and decreased survival in HCV-positive women with hepatocellular carcinoma. Oncol Lett. 2022;24(5):395. pmid:36276500
- 39. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–64. pmid:12925520
- 40. Chow ML, Li H-R, Winn ME, April C, Barnes CC, Wynshaw-Boris A, et al. Genome-wide expression assay comparison across frozen and fixed postmortem brain tissue samples. BMC Genomics. 2011;12:449. pmid:21906392
- 41. Chow ML, Winn ME, Li H-R, April C, Wynshaw-Boris A, Fan J-B, et al. Preprocessing and quality control strategies for illumina DASL assay-based brain gene expression studies with semi-degraded samples. Front Genet. 2012;3:11. pmid:22375143
- 42.
Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Zhang Y. Sva: Surrogate variable analysis; 2022.
- 43. Du P, Kibbe WA, Lin SM. lumi: A pipeline for processing Illumina microarray. Bioinformatics. 2008;24(13):1547–8. pmid:18467348
- 44. Shannon CE. A mathematical theory of communication. Bell Syst Techn J. 1948;27(3):379–423.
- 45. Garner WR, McGill WJ. The relation between information and variance analyses. Psychometrika. 1956;21(3):219–28.
- 46. Aboy M, Hornero R, Abásolo D, Alvarez D. Interpretation of the Lempel-Ziv complexity measure in the context of biomedical signal analysis. IEEE Trans Biomed Eng. 2006;53(11):2282–8. pmid:17073334
- 47. Inouye T, Shinosaki K, Sakamoto H, Toi S, Ukai S, Iyama A, et al. Quantification of EEG irregularity by use of the entropy of the power spectrum. Electroencephalogr Clin Neurophysiol. 1991;79(3):204–10. pmid:1714811
- 48.
Cover TM, Thomas JA. Entropy, relative entropy and mutual information. Elem Inform Theory. 1991;2(1):12–3.
- 49.
Latham PE, Roudi Y. Mutual information. Scholarpedia; 2009.
- 50.
Sokal RR, Rohlf FJ. Biostatistics. New York: Francise & Co; 1987.
- 51. Kinney JB, Atwal GS. Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci U S A. 2014;111(9):3354–9. pmid:24550517
- 52. Nelken I, Chechik G. Information theory in auditory research. Hear Res. 2007;229(1–2):94–105. pmid:17300891
- 53. Steuer R, Kurths J, Daub CO, Weise J, Selbig J. The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics. 2002;18 Suppl 2:S231-40. pmid:12386007
- 54.
Nelsen RB. An introduction to copulas. Springer; 2006.
- 55. Kaufman L. Partitioning around medoids (program pam). Finding groups in data. 1990;344:68–125.
- 56. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: Guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233. pmid:30914743
- 57. Kursa MB, Rudnicki WR. Feature selection with the Boruta Package. J Stat Soft. 2010;36(11).
- 58. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
- 59. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. pmid:25605792
- 60. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. pmid:16199517
- 61. Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. pmid:10592173
- 62. Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 2019;28(11):1947–51. pmid:31441146
- 63. Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023;51(D1):D587–92. pmid:36300620
- 64. Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7. pmid:22455463
- 65. Dong Z-R, Ke A-W, Li T, Cai J-B, Yang Y-F, Zhou W, et al. CircMEMO1 modulates the promoter methylation and expression of TCF21 to regulate hepatocellular carcinoma progression and sorafenib treatment sensitivity. Mol Cancer. 2021;20(1):75. pmid:33985545
- 66.
Liu X, Shan W, Li T, Gao X, Kong F, You H, et al. Cellular retinol binding protein-1 inhibits cancer stemness via upregulating WIF1 to suppress Wnt/
-catenin pathway in hepatocellular carcinoma. BMC Cancer. 2021;21(1):1224. pmid:34775955
- 67. Kruskal WH, Wallis WA. Use of ranks in one-criterion variance analysis. J Am Stat Assoc. 1952;47(260):583–621.
- 68. Murtaza N, Cheng AA, Brown CO, Meka DP, Hong S, Uy JA, et al. Neuron-specific protein network mapping of autism risk genes identifies shared biological mechanisms and disease-relevant pathologies. Cell Rep. 2022;41(8):111678. pmid:36417873
- 69. Alankarage D, Szot JO, Pachter N, Slavotinek A, Selleri L, Shieh JT, et al. Functional characterization of a novel PBX1 de novo missense variant identified in a patient with syndromic congenital heart disease. Hum Mol Genet. 2020;29(7):1068–82. pmid:31625560
- 70. Yang T, Chen L, Dai Y, Jia F, Hao Y, Li L, et al. Vitamin A status is more commonly associated with symptoms and neurodevelopment in boys with autism spectrum disorders—a multicenter study in China. Front Nutr. 2022;9:851980. pmid:35495950
- 71. Rudolph HC, Stafford AM, Hwang H-E, Kim C-H, Prokop JW, Vogt D. Structure-function of the human WAC protein in GABAergic neurons: Towards an understanding of autosomal dominant DeSanto-Shinawi syndrome. Biology (Basel). 2023;12(4):589. pmid:37106788
- 72.
Lakshminarasimhan R, Liang G. The role of DNA methylation in cancer. DNA Methyltransferases-Role and Function; 2016. p. 151–72.
- 73. Hattori N, Ushijima T. Epigenetic impact of infection on carcinogenesis: Mechanisms and applications. Genome Med. 2016;8(1):10. pmid:26823082
- 74. Tischoff I, Tannapfe A. DNA methylation in hepatocellular carcinoma. World J Gastroenterol. 2008;14(11):1741–8. pmid:18350605
- 75. Hernandez-Meza G, von Felden J, Gonzalez-Kozlova EE, Garcia-Lezana T, Peix J, Portela A, et al. DNA methylation profiling of human hepatocarcinogenesis. Hepatology. 2021;74(1):183–99. pmid:33237575
- 76.
Khalaf AM, Fuentes D, Morshid AI, Burke MR, Kaseb AO, Hassan M, et al. Role of Wnt/
-catenin signaling in hepatocellular carcinoma, pathogenesis, and clinical significance. J Hepatocell Carcinoma. 2018;5:61–73. pmid:29984212
- 77.
Gajos-Michniewicz A, Czyz M. WNT/
-catenin signaling in hepatocellular carcinoma: The aberrant activation, pathogenic roles, and therapeutic opportunities. Genes Dis. 2023;11(2):727–46. pmid:37692481
- 78.
Bakrania A, To J, Zheng G, Bhat M. Targeting Wnt-
-catenin signaling pathway for hepatocellular carcinoma nanomedicine. Gastro Hep Adv. 2023;2(7):948–63. pmid:39130774
- 79. De Ioris M, Brugieres L, Zimmermann A, Keeling J, Brock P, Maibach R, et al. Hepatoblastoma with a low serum alpha-fetoprotein level at diagnosis: The SIOPEL group experience. Eur J Cancer. 2008;44(4):545–50. pmid:18166449
- 80. Yu Y, Zhao D, Li K, Cai Y, Xu P, Li R, et al. E2F1 mediated DDX11 transcriptional activation promotes hepatocellular carcinoma progression through PI3K/AKT/mTOR pathway. Cell Death Dis. 2020;11(4):273. pmid:32332880
- 81. Conner EA, Lemmer ER, Sánchez A, Factor VM, Thorgeirsson SS. E2F1 blocks and c-Myc accelerates hepatic ploidy in transgenic mouse models. Biochem Biophys Res Commun. 2003;302(1):114–20. pmid:12593856
- 82. Chen X-F, Tian M-X, Sun R-Q, Zhang M-L, Zhou L-S, Jin L, et al. SIRT5 inhibits peroxisomal ACOX1 to prevent oxidative damage and is downregulated in liver cancer. EMBO Rep. 2018;19(5):e45124. pmid:29491006
- 83. Lu D, He A, Tan M, Mrad M, El Daibani A, Hu D, et al. Liver ACOX1 regulates levels of circulating lipids that promote metabolic health through adipose remodeling. Nat Commun. 2024;15(1):4214. pmid:38760332
- 84. De Rubeis S, He X, Goldberg AP, Poultney CS, Samocha K, Cicek AE, et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature. 2014;515(7526):209–15. pmid:25363760
- 85. C Yuen RK, Merico D, Bookman M, L Howe J, Thiruvahindrapuram B, Patel RV, et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat Neurosci. 2017;20(4):602–11. pmid:28263302
- 86. Zhou X, Feliciano P, Shu C, Wang T, Astrovskaya I, Hall JB, et al. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat Genet. 2022;54(9):1305–19. pmid:35982159
- 87. Golovina E, Fadason T, Lints TJ, Walker C, Vickers MH, O’Sullivan JM. Understanding the impact of SNPs associated with autism spectrum disorder on biological pathways in the human fetal and adult cortex. Sci Rep. 2021;11(1):15867. pmid:34354167
- 88. Richdale AL, Prior MR. Urinary cortisol circadian rhythm in a group of high-functioning children with autism. J Autism Dev Disord. 1992;22(3):433–47. pmid:1400105
- 89. Yamazaki K, Saito Y, Okada F, Fujieda T, Yamashita I. An application of neuroendocrinological studies in autistic children and Heller’s syndrome. J Autism Child Schizophr. 1975;5(4):323–32. pmid:173703
- 90. Tordjman S, Anderson GM, McBride PA, Hertzig ME, Snow ME, Hall LM, et al. Plasma beta-endorphin, adrenocorticotropin hormone, and cortisol in autism. J Child Psychol Psychiatry. 1997;38(6):705–15. pmid:9315980
- 91. Aihara R, Hashimoto T. Neuroendocrinologic studies on autism. No To Hattatsu. 1989;21(2):154–62. pmid:2713159
- 92. Hill SD, Wagner EA, Shedlarski JG Jr, Sears SP. Diurnal cortisol and temperature variation of normal and autistic children. Dev Psychobiol. 1977;10(6):579–83. pmid:563824
- 93. Hoshino Y, Yokoyama F, Watanabe M, Murata S, Kaneko M, Kumashiro H. The diurnal variation and response to dexamethasone suppression test of saliva cortisol level in autistic children. Jpn J Psychiatry Neurol. 1987;41(2):227–35. pmid:3437610
- 94. Spratt EG, Nicholas JS, Brady KT, Carpenter LA, Hatcher CR, Meekins KA, et al. Enhanced cortisol response to stress in children in autism. J Autism Dev Disord. 2012;42(1):75–81. pmid:21424864
- 95. Taylor JL, Corbett BA. A review of rhythm and responsiveness of cortisol in individuals with autism spectrum disorders. Psychoneuroendocrinology. 2014;49:207–28. pmid:25108163
- 96. Chiang H-L, Liu C-J, Hu Y-W, Chen S-C, Hu L-Y, Shen C-C, et al. Risk of cancer in children, adolescents, and young adults with autistic disorder. J Pediatr. 2015;166(2):418-23.e1. pmid:25453246
- 97. Darbro BW, Singh R, Zimmerman MB, Mahajan VB, Bassuk AG. Autism linked to increased oncogene mutations but decreased cancer rate. PLoS One. 2016;11(3):e0149041. pmid:26934580
- 98. Hirvikoski T, Mittendorfer-Rutz E, Boman M, Larsson H, Lichtenstein P, Bölte S. Premature mortality in autism spectrum disorder. Br J Psychiatry. 2016;208(3):232–8. pmid:26541693
- 99. Gillberg C, Billstedt E, Sundh V, Gillberg IC. Mortality in autism: A prospective longitudinal community-based study. J Autism Dev Disord. 2010;40(3):352–7. pmid:19838782
- 100. Wen Y, Herbert MR. Connecting the dots: Overlaps between autism and cancer suggest possible common mechanisms regarding signaling pathways related to metabolic alterations. Med Hypotheses. 2017;103:118–23. pmid:28571796
- 101. Crawley JN, Heyer W-D, LaSalle JM. Autism and cancer share risk genes, pathways, and drug targets. Trends Genet. 2016;32(3):139–46. pmid:26830258
- 102. Forés-Martos J, Catalá-López F, Sánchez-Valle J, Ibáñez K, Tejero H, Palma-Gudiel H, et al. Transcriptomic metaanalyses of autistic brains reveals shared gene expression and biological pathway abnormalities with cancer. Mol Autism. 2019;10:17. pmid:31007884
- 103. Aluko OM, Lawal SA, Ijomone OM, Aschner M. Perturbed MAPK signaling in ASD: Impact of metal neurotoxicity. Curr Opin Toxicol. 2021;26:1–7. pmid:34263087
- 104. Vithayathil J, Pucilowska J, Landreth GE. ERK/MAPK signaling and autism spectrum disorders. Prog Brain Res. 2018;241:63–112. pmid:30447757
- 105. Ijomone OM, Olung NF, Akingbade GT, Okoh COA, Aschner M. Environmental influence on neurodevelopmental disorders: Potential association of heavy metal exposure and autism. J Trace Elem Med Biol. 2020;62:126638. pmid:32891009
- 106. Kim EK, Choi E-J. Pathological roles of MAPK signaling pathways in human diseases. Biochim Biophys Acta. 2010;1802(4):396–405. pmid:20079433
- 107. Martínez-González AE, Andreo-Martínez P. The role of gut microbiota in gastrointestinal symptoms of children with ASD. Medicina (Kaunas). 2019;55(8):408. pmid:31357482
- 108. Bauman ML. Medical comorbidities in autism: Challenges to diagnosis and treatment. Neurotherapeutics. 2010;7(3):320–7. pmid:20643385
- 109. Adams JB, Johansen LJ, Powell LD, Quig D, Rubin RA. Gastrointestinal flora and gastrointestinal status in children with autism–comparisons to typical children and correlation with autism severity. BMC Gastroenterol. 2011;11:22. pmid:21410934
- 110. Chaidez V, Hansen RL, Hertz-Picciotto I. Gastrointestinal problems in children with autism, developmental delays or typical development. J Autism Dev Disord. 2014;44(5):1117–27. pmid:24193577
- 111. Holingue C, Newill C, Lee L-C, Pasricha PJ, Daniele Fallin M. Gastrointestinal symptoms in autism spectrum disorder: A review of the literature on ascertainment and prevalence. Autism Res. 2018;11(1):24–36. pmid:28856868
- 112. Oh D, Cheon K-A. Alteration of Gut Microbiota in Autism Spectrum Disorder: An Overview. Soa Chongsonyon Chongsin Uihak. 2020;31(3):131–45. pmid:32665757
- 113. Garcia-Gutierrez E, Narbad A, Rodríguez JM. Autism spectrum disorder associated with gut microbiota at immune, metabolomic, and neuroactive level. Front Neurosci. 2020;14:578666.
- 114. Ye F, Gao X, Wang Z, Cao S, Liang G, He D, et al. Comparison of gut microbiota in autism spectrum disorders and neurotypical boys in China: A case-control study. Synth Syst Biotechnol. 2021;6(2):120–6. pmid:34095558
- 115. Myles IA, Fontecilla NM, Janelsins BM, Vithayathil PJ, Segre JA, Datta SK. Parental dietary fat intake alters offspring microbiome and immunity. J Immunol. 2013;191(6):3200–9. pmid:23935191
- 116. Kittana M, Ahmadani A, Al Marzooq F, Attlee A. Dietary fat effect on the gut microbiome, and its role in the modulation of gastrointestinal disorders in children with autism spectrum disorder. Nutrients. 2021;13(11):3818. pmid:34836074
- 117. Curtis LT, Patel K. Nutritional and environmental approaches to preventing and treating autism and attention deficit hyperactivity disorder (ADHD): A review. J Altern Complement Med. 2008;14(1):79–85. pmid:18199019
- 118. Fujiwara T, Morisaki N, Honda Y, Sampei M, Tani Y. Chemicals, nutrition, and autism spectrum disorder: A mini-review. Front Neurosci. 2016;10:174.
- 119. Khan NA, Raine LB, Drollette ES, Scudder MR, Hillman CH. The relation of saturated fats and dietary cholesterol to childhood cognitive flexibility. Appetite. 2015;93:51–6. pmid:25865659
- 120. Khan N, Raine L, Drollette E, Scudder M, Kramer A, Hillman C. Dietary fiber is positively associated with cognitive control among prepubertal children. World Rev Nutr Dietetics. 2016;114:82–3.
- 121. Guo M, Li L, Zhang Q, Chen L, Dai Y, Liu L, et al. Vitamin and mineral status of children with autism spectrum disorder in Hainan Province of China: Associations with symptoms. Nutr Neurosci. 2020;23(10):803–10. pmid:30570388
- 122. Shen L, Zhang H, Lin J, Gao Y, Chen M, Khan NU, et al. A combined proteomics and metabolomics profiling to investigate the genetic heterogeneity of autistic children. Mol Neurobiol. 2022;59(6):3529–45. pmid:35348996
- 123. Ristori MV, Mortera SL, Marzano V, Guerrera S, Vernocchi P, Ianiro G, et al. Proteomics and metabolomics approaches towards a functional insight onto AUTISM spectrum disorders: Phenotype stratification and biomarker discovery. Int J Mol Sci. 2020;21(17):6274. pmid:32872562
- 124. Tang X, Feng C, Zhao Y, Zhang H, Gao Y, Cao X, et al. A study of genetic heterogeneity in autism spectrum disorders based on plasma proteomic and metabolomic analysis: Multiomics study of autism heterogeneity. MedComm 2020 . 2023;4(5):e380. pmid:37752942
- 125. Shaltouki A, Hsieh C-H, Kim MJ, Wang X. Alpha-synuclein delays mitophagy and targeting Miro rescues neuron loss in Parkinson’s models. Acta Neuropathol. 2018;136(4):607–20. pmid:29923074
- 126. Kerr JS, Adriaanse BA, Greig NH, Mattson MP, Cader MZ, Bohr VA, et al. Mitophagy and Alzheimer’s disease: Cellular and molecular mechanisms. Trends Neurosci. 2017;40(3):151–66. pmid:28190529
- 127. Napoli E, Song G, Panoutsopoulos A, Riyadh MA, Kaushik G, Halmai J, et al. Beyond autophagy: A novel role for autism-linked Wdfy3 in brain mitophagy. Sci Rep. 2018;8(1):11348. pmid:30054502
- 128. Wang Y-M, Qiu M-Y, Liu Q, Tang H, Gu H-F. Critical role of dysfunctional mitochondria and defective mitophagy in autism spectrum disorders. Brain Res Bull. 2021;168:138–45. pmid:33400955
- 129. Jyonouchi H, Geng L, Ruby A, Reddy C, Zimmerman-Bier B. Evaluation of an association between gastrointestinal symptoms and cytokine production against common dietary proteins in children with autism spectrum disorders. J Pediatr. 2005;146(5):605–10. pmid:15870662
- 130. Vojdani A, Campbell AW, Anyanwu E, Kashanian A, Bock K, Vojdani E. Antibodies to neuron-specific antigens in children with autism: Possible cross-reaction with encephalitogenic proteins from milk, Chlamydia pneumoniae and Streptococcus group A. J Neuroimmunol. 2002;129(1–2):168–77. pmid:12161033
- 131. Vojdani A, Bazargan M, Vojdani E, Samadi J, Nourian AA, Eghbalieh N, et al. Heat shock protein and gliadin peptide promote development of peptidase antibodies in children with autism and patients with autoimmune disease. Clin Diagn Lab Immunol. 2004;11(3):515–24. pmid:15138176
- 132. Vojdani A, O’Bryan T, Green JA, Mccandless J, Woeller KN, Vojdani E, et al. Immune response to dietary proteins, gliadin and cerebellar peptides in children with autism. Nutr Neurosci. 2004;7(3):151–61. pmid:15526989
- 133. Singh VK, Warren RP, Odell JD, Warren WL, Cole P. Antibodies to myelin basic protein in children with autistic behavior. Brain Behav Immun. 1993;7(1):97–103. pmid:7682457
- 134. Sanctuary MR, Kain JN, Angkustsiri K, German JB. Dietary considerations in autism spectrum disorders: The potential role of protein digestion and microbial putrefaction in the gut-brain axis. Front Nutr. 2018;5:40. pmid:29868601