^{1}

^{2}

^{2}

^{2}

^{1}

^{3}

^{2}

^{*}

Conceived and designed the experiments: DJS. Wrote the paper: SA DJS. Developed the methods: SA RML KL DJS. Implemented the methods: SA. Processed various data sets: RTW CCP AL.

The authors have declared that no competing interests exist.

Meiotic mapping of quantitative trait loci regulating expression (eQTLs) has allowed the construction of gene networks. However, the limited mapping resolution of these studies has meant that genotype data are largely ignored, leading to undirected networks that fail to capture regulatory hierarchies. Here we use high resolution mapping of copy number eQTLs (ceQTLs) in a mouse-hamster radiation hybrid (RH) panel to construct directed genetic networks in the mammalian cell. The RH network covering 20,145 mouse genes had significant overlap with, and similar topological structures to, existing biological networks. Upregulated edges in the RH network had significantly more overlap than downregulated. This suggests repressive relationships between genes are missed by existing approaches, perhaps because the corresponding proteins are not present in the cell at the same time and therefore unlikely to interact. Gene essentiality was positively correlated with connectivity and betweenness centrality in the RH network, strengthening the centrality-lethality principle in mammals. Consistent with their regulatory role, transcription factors had significantly more outgoing edges (regulating) than incoming (regulated) in the RH network, a feature hidden by conventional undirected networks. Directed RH genetic networks thus showed concordance with pre-existing networks while also yielding information inaccessible to current undirected approaches.

An important problem in systems biology is to map gene networks, which help identify gene functions and discover critical disease pathways. Current methods for constructing gene networks have identified a number of biologically significant functional modules. However, these networks do not reveal directionality, that is, which gene regulates which, an important aspect of gene regulation. Radiation hybrid panels are a venerable method for high resolution genetic mapping. Recently we have used radiation hybrids to map loci based on their effects on gene expression. Because these regulatory loci are finely mapped, we can identify which gene turns on another gene, that is, directionality. In this paper, we constructed directed networks from radiation hybrid expression data. We found the radiation hybrid networks concordant with available datasets but also demonstrate that they can reveal information inaccessible to existing approaches. Importantly, directionality can help dissect cause and effect in genetic networks, aiding in understanding and ultimately rational intervention.

Interrogating genome-scale datasets is a necessary step to a systems biology of the mammalian cell

Networks constructed using these various approaches are correlated, with some exceptions. While a single dataset often has a large number of false positives and false negatives and reflects only one facet of gene function, accessing multiple independent datasets increases the reliability of gene functional annotation. Integrating diverse gene networks has been shown predictive of loss-of-function phenotypes in yeast

Recently transcriptional networks have been constructed using expression data from genetically polymorphic individuals

A disadvantage of most currently available networks is that it is difficult to infer functional relationships between interacting genes. Consequently, the edges between genes are undirected and have no regulatory hierarchy. This is also true of eQTL networks where, because of limited mapping power, genotype information has been generally ignored and coexpression networks have been constructed instead

Radiation hybrid panels have been used to construct high resolution maps of mammalian genomes ^{+}^{−}^{+}

We recently used the T31 RH panel for high-resolution mapping of QTLs for gene expression

Using regression, we found 29,769

In this paper we evaluate gene networks constructed from ceQTL mapping. In contrast to undirected networks from meiotically mapped eQTLs and protein binding approaches, the high resolution mapping and dense genotyping of ceQTLs in the RH panel allowed the use of genotype information to construct directed networks. This directionality permits insights that cannot be obtained from undirected networks.

We previously analyzed a mouse-hamster radiation hybrid panel, T31

Transcript abundance and marker dosage were measured by mouse expression arrays and comparative genomic hybridization arrays (aCGH), respectively. A total of 20,145 transcript levels were assayed by the expression arrays and 232,626 markers by the aCGH. We mapped ceQTLs by regressing the expression array data on the aCGH data. Mouse and hamster genes were detected with comparable efficiency and behaved equivalently in terms of regulation

To construct the RH network, the copy number of each gene was estimated by linear interpolation using the two neighboring aCGH markers. The linear interpolation based estimation is reasonable, considering the high density of aCGH markers.

Measured transcripts were denoted by

Previously, we used an F-statistic, which is monotonic in the absolute value

We constructed an adjacency matrix

Since nearly all genes show a copy number increase in a portion of the RH panel, the bulk of genes (94%) also showed a

We examined the similarity of our network to existing datasets including protein-protein interactions from HPRD (Human Protein Reference Database)

To compare the directed RH network and the other undirected networks, we ignored the edge directions in the RH network and calculated the resulting overlap. To test overlap significance, we used a one-sided Fisher's exact test based on a two by two contingency table, replaced with a one-sided chi-square test when the expected values in all table cells exceeded 50

(A) HPRD protein-protein interaction network. (B) KEGG pathway network. (C) SymAtlas coexpression network. (D) GO annotations. (E) GO molecular function annotation. (F) GO cellular component annotation. (G) GO biological annotation. (H) Averaged

The existing networks we used for comparison vary in size from 20,957 edges (HPRD network) to 18,754,380 (SymAtlas coexpression network) (see

The maximum overlap significance occurred at low correlation coefficient thresholds between 0 and 0.2 (

(A) Overlap between 10 randomly permuted RH networks and HPRD network. The RH networks were constructed from right-tailed thresholding and one-sided Fisher's exact and chi-square tests used to assess significance. (B) Averaged

Next we investigated how the number of RH clones affects the overlap. The sensitivity and resolution of the RH network should improve as the number of RH clones increases. To test this, we randomly selected a subset of the 99 RH clones (40, 60, 80 and 99 clones) and calculated the significance of overlap with the HPRD network using the one-sided Fisher's exact and chi-square tests (

We assume that for each undirected network there is a hidden directed random network, modeled as in

The results of the comparison of the directed RH network and the hidden directed random network are shown in

Same as

We examined whether upregulation in the RH data, represented by positive correlation coefficients,

Unweighted RH networks obtained from left-tailed thresholding, which emphasized downregulation, did not show any significant overlap (FDR-corrected

The overlap analysis based on edge-comparison may fail to capture some indirect interactions or other topologies. We therefore compared the topological properties of the RH and HPRD networks.

The degrees (number of edges for each node, or connectivity) of the weighted (unthresholded) RH and HPRD networks were significantly correlated (Spearman's correlation coefficient = 0.055,

Next, we compared the betweenness centralities of the RH and HPRD networks. The betweenness centrality measures the total number of nonredundant shortest paths going through each node, representing the severity of bottlenecks in the network

We calculated the diameters (average minimum distance between pairs of nodes) of the RH and HPRD networks. The diameter of a giant connected component, consisting of 5,433 nodes with 20,859 undirected edges excepting self-loops, of the HPRD network was 4.13. For the RH network, we considered those 5,433 genes that were in the HPRD network and used a right-tailed threshold of 0.37544, yielding 20,859 undirected edges, to make its size (node and edge numbers) comparable to the HPRD network. The diameter of the RH network was 4.11, close to that (4.13) of the HPRD network.

We also compared the clustering coefficients of the RH and HPRD networks, a measure of local cliqueness

Previous studies in other networks showed that essentiality is positively correlated with connectivity and betweenness centrality

Essential genes had significantly more edges than non-essential genes for a range of right-tailed correlation coefficient thresholds from −0.12 to 0.16 (FDR = 0.01) using a one-sided Wilcoxon rank-sum test

(A) P-values for one-sided Wilcoxon rank-sum test assessing whether essential genes have significantly more edges than non-essential. (B) Fraction of essential genes and degree of weighted RH network. (C) P-values for one-sided Wilcoxon rank-sum test assessing whether essential genes have significantly larger betweenness centralities than non-essential. (D) Fraction of essential genes and betweenness centrality of RH network constructed with correlation coefficient threshold of 0.1 by right-tailed thresholding.

Similarly, essential genes had significantly larger betweenness centralities for a range of right-tailed correlation coefficient thresholds from −0.14 to 0.16 (FDR = 0.01) using a one-sided Wilcoxon rank-sum test (

It is natural to suppose that transcription factors would have more outgoing than incoming edges since transcription factors regulate other genes. This proposition cannot be tested in conventional undirected networks, but can be tested in the directed RH network. Using a one-sided paired signed rank test

(A) P-values for one-sided paired signed rank test assessing whether transcription factors have significantly more outgoing than incoming edges. (B) Overlap between transcription factors and genes having ≥1 outgoing or incoming edge. P-values from one-sided Fisher's exact and chi-square tests.

We used high resolution mapping of ceQTLs in an RH panel to create a directed genetic network. There was significant overlap with existing networks such as HPRD, KEGG, GO annotation and a SymAtlas coexpression network. The RH network also showed similar topological properties to the HPRD network in connectivity, betweenness centrality and diameter.

The RH network showed maximum significance of overlap with existing networks at relatively low positive correlation coefficient thresholds between 0 and 0.2. The low thresholds were not simply by chance, since randomly permuted RH networks did not show any significant overlap with existing networks. Also, the low values did not seem to be caused by noise in the array measurements or by lack of sufficient numbers of RH clones, since the correlation coefficient thresholds giving maximum overlap significance remained nearly constant for varying clone number, although the sensitivity of overlap increased with the number of clones. This may reflect the orthogonal nature of the RH network compared to existing networks, suggesting the RH approach will yield complementary information on mammalian genetic networks. Novel and replicated edges in the RH network may thus be balanced in the low correlation coefficient threshold range.

The overlap between the RH network and existing interaction networks was greater for edges possessing upregulation than downregulation. This observation may be because the corresponding proteins are unlikely to interact if one gene represses another, since the proteins will not be present in the cell at the same time. It also implies that protein-protein interaction networks may fail to uncover valid edges between genes if they have a repressive relationship.

Previous studies found significant associations of essentiality with connectivity and/or betweenness centrality in protein-protein interaction networks

We also showed that transcription factors were likely to have more outgoing rather than incoming edges. While this finding is not unexpected and helps validate the RH network, a recent study using naturally occurring polymorphisms in yeast suggested that transcription factors are no more likely to reside close to eQTLs than chance

We thresholded the adjacency matrix at different correlation coefficients to compare unweighted RH networks with existing unweighted datasets. However, we chose to leave the RH network weighted rather than finalizing an unweighted form at an optimal threshold. Such an operation is irreversible and would lose information on linkage strength and sign. In other studies, the sensitivity of a coexpression network was limited by thresholding

We constructed a directed gene network from radiation hybrids and found it concordant with existing networks. We also showed that RH networks have the potential to provide new insights reflecting orthogonal aspects of gene regulation. The RH networks will be refined as more panels, including those available for other species, are analyzed resulting in improved power and sensitivity.

Details on the analysis of the T31 RH panel cells and the preprocessing of aCGH and expression array data can be found in

The directed RH network was constructed as described in

A protein-protein interaction network was constructed from HPRD (Human Protein Reference Database)

A network was constructed from the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database

A network was constructed from the GO (Gene Ontology) database

We constructed an mRNA coexpression network from the publicly available SymAtlas microarray database

The significance of overlap between the RH network obtained from thresholding and, for example, the HPRD network was tested as follows.

First, for a given threshold

We similarly calculated the significance of overlaps with the KEGG pathway network, the SymAtlas coexpression network and the GO annotations.

We randomly permuted the elements of the weighted and directed adjacency matrix

We randomly selected 40, 60 or 80 RH clones out of 99 and constructed an adjacency matrix (see

For each existing undirected dataset, for example, the HPRD network, we assume there is a hidden directed random network with adjacency matrix

The overlap between the unweighted (thresholded) directed RH network, represented by

Ignoring the constant scaling factor without loss of generality, we define an overlap as

The node degree of the undirected, weighted adjacency matrix

The betweenness centralities and clustering coefficients of the RH adjacency matrix

We obtained a list of 1,409 essential genes and 1,979 nonessential genes from the Mouse Genome Database

We obtained a list of 1,053 transcription factors by finding genes whose GO description includes a word “transcription.” The number of outgoing edges was calculated by

The network data are available at

Relationship between one-sided chi-square test and Bayesian log-likelihood score (LLS) method

(0.08 MB PDF)

Size of RH network constructed from right-tailed, left-tailed and both-tailed thresholding approaches.

(0.06 MB PDF)

Size of RH network. (A) Number of nodes with nonzero degree for RH network constructed from right-tailed thresholding. (B) Number of directed edges for RH network constructed from right-tailed thresholding. (C) Number of nodes with nonzero degree for RH network constructed from left-tailed thresholding. (D) Number of directed edges for RH network constructed from left-tailed thresholding. (E) Number of nodes with nonzero degree for RH network constructed from both-tailed thresholding. (F) Number of directed edges for RH network constructed from both-tailed thresholding.

(0.21 MB TIF)

Overlap between RH network constructed from right-tailed thresholding and existing datasets. Same as

(0.26 MB TIF)

Significance of overlap between RH network constructed from left-tailed thresholding and existing datasets. Same as

(0.25 MB TIF)

Significance of overlap between RH network constructed from both-tailed thresholding and existing datasets. Same as

(0.28 MB TIF)

Significance of overlap between RH network and existing datasets.

(0.78 MB XLS)