Gene Network Analysis of Candidate Loci for Human Anorectal Malformations

Anorectal malformations (ARMs) are birth defects that require surgery and carry significant chronic morbidity. Our earlier genome-wide copy number variation (CNV) study had provided a wealth of candidate loci. To find out whether these candidate loci are related to important developmental pathways, we have performed an extensive literature search coupled with the currently available bioinformatics tools. This has allowed us to assign both genic and non-genic CNVs to interrelated pathways known to govern the development of the anorectal region. We have linked 11 candidate genes to the WNT signalling pathway and 17 genes to the cytoskeletal network. Interestingly, candidate genes with similar functions are disrupted by the same type of CNV. The gene network we discovered provides evidence that rare mutations in different interrelated genes may lead to similar phenotypes, accounting for genetic heterogeneity in ARMs. Classification of patients according to the affected pathway and lesion type should eventually improve the diagnosis and the identification of common genes/molecules as therapeutic targets.


Introduction
Anorectal malformations (ARMs, congenital obstruction of the anal opening) are among the most common birth defects requiring surgical treatment (2-5/10,000 live-births) and carry significant chronic morbidity. There is a wide spectrum of defects that differ in terms of malformation severity (ranging from anal stenosis to persistent cloaca) and the presence of different associated congenital anomalies. As shown in the genome-wide copy number variation (CNV) study we published recently [1], rare CNVs (either genic or non-genic) private to individual patients are likely to underlie the phenotype. Notably, the CNVs identified differ among patients indicating that ARMs are genetically heterogeneous diseases. Indeed, given the complexity of molecular events that take place during embryonic stages, impairments in any gene members of the pathways controlling the development of the anorectal region could lead to ARMs (or even lethality), thus accounting for the genetic heterogeneity observed among patients. Relevant pathways reported so far in animal studies include WNT, BMP and SHH.
To find out whether the CNVs identified in our study could be assigned to the afore mentioned pathways, we performed an extensive literature search which, coupled with the currently available bioinformatics tools, have allowed us to assign both genic and non-genic CNVs to interrelated pathways known to govern the development of the anorectal region.

Results and Discussion
A total of 79 genes distributed among 67 genic-CNV regions (1 recurrent, 66 singletons) and 56 non-genic regions (1 recurrent, 55 singletons) were found uniquely disrupted in 88 out of 170 ARM patients analysed (Table S1 in File S1). These CNVs were not reported in the 11,943 healthy individuals from the Database of Genomic Variants (DGV), nor were they found in our 868 control individuals.
Non-genic regions were scrutinized for the presence of regulatory elements controlling the expression of gene members of the relevant pathways. To this end, genomic regions were submitted to RegulomeDB [2], which detects regulatory signatures within the regions, including transcription factor binding sites (TFBS) and expression quantitative trait loci (eQTL) that can be linked to the expression of their target genes. Importantly, RegulomeDB summarizes the data recently released by ENCODE (the Encyclopedia of DNA Elements) [3]. The 79 genes uniquely disrupted by CNVs in patients, together with the TFBS and eQTL target genes, are denoted as ''candidate genes'' here. Details on the functions and characteristics of these genes were obtained from DAVID bioinformatics resources [4], NCBI Gene (http://www. ncbi.nlm.nih.gov/gene) and literature search. For those patients with multiple candidate genes disrupted, the most biologically plausible genes were selected and examined. The relationship among the genes or regulatory regions identified, together with the phenotypic characteristics of the patients are depicted in Figures 1 and 2.
Eleven candidate genes (each altered in a different patient) were assigned/linked to the WNT signalling pathway and one (ESCIT) to the WNT related BMP signalling pathway (see Table 1 and supplementary material in File S1 for details). Among those, the three duplicated genes (DKK4, AMOTL1 and PROK1) known to be antagonists of the WNT/b-catenin signalling [5][6][7] are essential for the development of the anorectal region (Table S2 in File S1). Importantly, excess of DKK4 has been associated with the ARMs phenotype in mice [1]. Six genes related to b-catenin are affected by CNVs in six ARM patients respectively and are described as follows. Deletions of genes encoding b-catenin-binding proteins (SOX6, CDH18 and CTNND2) and regulatory regions affecting the expression of genes encoding other b-catenin-binding proteins (target genes: CTNNA1, SOX4) were detected. In addition, a gene affecting the expression of b-catenin negative regulator (EAF1) is duplicated. Genes encoding effectors (INTU and WDPCP) of the planar cell polarity pathway (PCP; non-canonical WNT signalling) were deleted as well. Apart from PCP effectors, a deletion was observed in DESI2, which acts in parallel with the non-canonical WNT signalling pathway regulating cell movement during embryonic development. Planar cell polarity and WNT/b-catenin signalling are inter-related and they interact with other important signalling pathways during embryonic development, including the BMP signalling pathway. ESCIT, a cofactor of Smad proteins which mediate BMP signalling, was duplicated in an ARM patient. Defects in the above pathways are known to recapitulate the ARM phenotype in mice [8][9][10].
We also detected the disruption of 17 genes that are functionally related to or located in cytoskeleton ( Table 2). Of note, the cytoskeletal network leading to ciliogenesis, intracellular trafficking and cell movements, is regulated by WNT signalling, and especially by PCP [11,12].
Among the 17 genes, 5 are related to actin filament regulation or binding (RCSD1, KLHL5, JAK2, AMOTL1and MRIP), 4 are involved in ciliogenesis or located in the cilium (INTU, WDPCP, LCA5 and TTLL9) and 2 are associated to microtubules (FAM110B and STIM1). Six other genes (TNNI3K, NLGN1, CTNND2, SGCD, FNBP1 and TBC1D4) are involved in regulation and/or organization of the cytoskeleton network. Regulation of the cytoskeleton would probably be important for the downstream process of WNT signalling, though the implication remains unknown.
Close examination of our previous data has allowed us to link 27 candidate genes to interrelated pathways of relevance to the development of the anorectal region. Interestingly, we noted that for WNT signalling, candidate genes with similar function were disrupted by the same type of CNVs in ARM patients: (1) WNT  signalling antagonists and b-catenin negative regulators were duplicated; (2) WNT signalling modulators (b-catenin binding proteins and PCP effectors) were all deleted. However, for those genes related to the cytoskeleton network, we did not observe any link between their function and type of lesion by which they were interrupted. Some features that hint at the pathogenicity, including the mode of inheritance and functional impact of the duplication cannot, however, be established as parental samples are not available and the location of the duplicated copy in the genome is unknown.
Validating all these CNVs in the 27 regions using another technology poses a challenge given limited DNA availability. However, the chance of these CNVs being false positives is low as the CNVs are called by at least two programs. The rarity of the CNVs in these 27 regions, together with the functional regions intersected suggest that indeed these CNVs play a role in ARMs.

Conclusions
The data presented here confirm that ARMs are genetically heterogeneous diseases. Also, the fact that genes with similar function within a network are disrupted by the same type of lesion may provide interesting insights into network interacting mechanisms. Ideally, classification of the patients according to type of genetic lesion and pathway affected should lead to improved clinical diagnosis, as well as to the identification of a putative common downstream gene or molecule which could be used as a target for future treatment development.

Subjects and ethics statement
The overall study was approved by the institutional review board of The University of Hong Kong together with the Hospital Authority (IRB: UW 07-321). Blood samples were drawn from all participants after obtaining written informed consent from patients or from parents, guardians or next of kin on behalf of minors.
ARM patients. 170 Chinese sporadic ARM patients (isolated or with additional associated anomalies) had prospectively been collected throughout Hong Kong and Mainland China. All patients included in this study went through renal ultrasound, lumbosacral radiography and ECHO cardiography. Patients were defined as syndromic if associated anomalies were observed in addition to ARMs.
Controls. We included 868 individuals who are phenotypically normal from other studies as controls: 111 controls from a hypertension study [13] and 757 individuals from an osteoporosis study [14].

Whole-genome CNV scan
The whole-genome scan was performed at deCODE Genetics (Reykjavik, Iceland) using Illumina Human 610-Quad BeadChips which assay 599,011 SNPs across the genome and 21,890 intensity-only monomorphic CNV probes. The B allele frequency (BAF) and log R ratio (LRR) were provided by deCODE for CNV calling. Details on CNV prediction and quality controls were described previously [1].

Disease gene mapping
Gene-based analysis. CNV calls in ARM patients were compared against our controls (N = 868) and the 11,943 healthy individuals from the DGV. We selected those genes that were uniquely disrupted by CNVs in ARM patients, but not in controls nor healthy individuals in DGV, for further analysis. Details on the functions and characteristics of these genes were obtained from DAVID (the database for annotation, visualization and integrated discovery) bioinformatics resources [4], NCBI Gene (http://www. ncbi.nlm.nih.gov/gene) and literature search.
Patient-unique non-genic CNV regions. For those nongenic CNV regions that were uniquely disrupted in ARM patients, we checked if they overlapped with any transcription factor binding sites or expression quantitative trait loci based on the database RegulomeDB [2], which summarized data from ENCODE. Functions of the TFBS nearby genes and eQTL target genes were also obtained from DAVID bioinformatics resources, NCBI Gene and literature search.
Expression in mouse embryo. To carefully scrutinize the genes related to ARMs, the list of ''candidate genes'' has been checked against Eurexpress (Mouse Gene Expression Atlas) for their expression of candidate genes in mouse embryo. Genes have been then categorized into four groups: (1) ''with regional signal'' if localized expression is observed in the urorectal organ, (2) ''without regional signal'' if the gene is expressed in the urorectal organ but is not specifically localized in a region, (3) ''no signal'' if it is not expressed at all in the urorectal organ, and (4) ''not available in database'' if there is no information on the expression of that gene. Those candidate loci that are not expressed in the mouse urorectal organ (i.e. category 3) have been excluded from the analysis.

Supporting Information
File S1 Supporting text and tables. Table S1: List of the 79 genes and 56 non-genic regions uniquely disrupted in ARMpatients but neither in our 868 controls nor in 11,943 healthy individuals from DGV.