An interaction-based model for neuropsychiatric features of copy-number variants

Variably expressive copy-number variants (CNVs) are characterized by extensive phenotypic heterogeneity of neuropsychiatric phenotypes. Approaches to identify single causative genes for these phenotypes within each CNV have not been successful. Here, we posit using multiple lines of evidence, including pathogenicity metrics, functional assays of model organisms, and gene expression data, that multiple genes within each CNV region are likely responsible for the observed phenotypes. We propose that candidate genes within each region likely interact with each other through shared pathways to modulate the individual gene phenotypes, emphasizing the genetic complexity of CNV-associated neuropsychiatric features.


A case for a multigenic model of CNV pathogenicity
Since the advent of large-scale sequencing studies, the number of genes associated with neurodevelopmental disorders such as autism, intellectual disability, and schizophrenia has increased dramatically. For example, nearly 200 genes have been identified with recurrent de novo mutations in both individuals with autism and intellectual disability [1][2][3][4][5][6][7][8]. In fact, complex human disease phenotypes can be influenced by variation in both a small number of core genes with large effect size and a large number of modifier genes with small effect size, accounting for the large number of candidate neurodevelopmental genes [9,10]. The application of a multigenic model for disease pathogenicity has not been fully expanded to cover copy-number variants (CNVs), or large duplications and deletions in the genome. The prevailing notion of single causative genes for CNV disorders is due to the paradigm of gene discoveries for CNVs associated with genetic syndromes in individuals with specific constellations of clinical features, such as Smith-Magenis syndrome (SMS). Although some variability in phenotypic expression has been documented, these disorders usually occur de novo and are characterized by high penetrance for the observed phenotypes [11,12] (Fig 1, S1 Table, S2 Table). In these cases, individuals manifesting the characteristic features of the syndrome but with a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 either atypical breakpoints or mutations in individual genes within the CNV region were used to identify causative genes for the major phenotypes [13][14][15]. These causative genes, such as RAI1 for SMS, were then confirmed by recapitulating conserved phenotypes of the deletion using functional evaluations in animal models [16,17].
In contrast, another category of CNVs has been identified in individuals with neurodevelopmental disorders, including duplications and deletions at proximal 16p11.2, 3q29, distal 16p11.2, and 1q21.1 [19][20][21][22]. Although these CNVs are enriched in affected individuals compared to population controls, they are primarily characterized by variable expressivity of clinical features [12,[23][24][25][26][27] (Fig 1B, S2 Table). For example, the 16p11.2 deletion has been implicated in 1% of individuals with idiopathic autism [19,28], but only 25% of individuals with the deletion exhibit an autism phenotype [29][30][31][32], whereas others may manifest intellectual disability, obesity, or epilepsy at varying degrees of penetrance [29,33,34]. In fact, certain CNVs, such as the 16p12.1 deletion and the 15q11.2 deletion, have a high frequency of carriers who only manifest mild neuropsychiatric features, in contrast to more severely affected individuals who also carry other rare variants in the genetic background [12,23,24,27,35,36]. As such, many variably expressive CNVs have a higher frequency of inherited compared to de novo occurrence [12] (Fig 1A, Table listing variably expressive (top) and syndromic (bottom) CNV regions is shown. The colored boxes indicate the frequency of de novo versus inherited CNV cases for del and dup previously identified in a cohort of 2,312 children with developmental disorders [12]. The 12 variably expressive CNV regions highlighted in bold were selected for the analysis described in the article. (B) Table listing average frequencies of neurodevelopmental phenotypes for select variably expressive and syndromic CNVs, curated from GeneReviews reports on individual CNVs [18], is shown. White boxes represent no available data from GeneReviews but do not necessarily indicate a lack of association between the CNV and the phenotype (for example, 1q21.1 deletion and schizophrenia). Data for this figure are available in S1 Table and S2 Table. CNV, copy-number variant; del, deletion; dup, duplication. cataracts and GJA5 for heart defects in 1q21.1 deletion [56,57], and MYH11 for aortic aneurysms in 16p13.11 duplication [58,59]. However, approaches to identify single causative genes for the more prominent neuropsychiatric features of these CNVs have not been successful [60]. Here, we show several lines of evidence from gene pathogenicity metrics, animal model studies, and gene expression data that support the involvement of multiple genes towards the neuropsychiatric features of variably expressive CNVs.
Second, several recent studies using animal and cellular models have demonstrated the critical involvement of several genes within CNVs towards neurological, cellular, and developmental functions [38,47,48,52,53,74] (Fig 2B, S4 Table). For example, Blaker-Lee and colleagues screened 22 homologs of 16p11.2 genes in zebrafish morpholino knockdown models and identified 20 homologs that contributed to morphological defects and abnormal behavior [38]. Iyer and colleagues also screened homologs of 16p11.2 genes in Drosophila melaogaster using RNA interference (RNAi) knockdown and found that 10 out of 14 homologs contributed to global developmental defects as well as specific neuronal and cellular defects in the developing fly eye [47]. Further, mouse models for 15 genes within the 16p11.2 region have been generated to test for defects in development and neuronal behavior [46,49-51, [75][76][77][78][79][80][81][82][83][84][85][86][87]. For example, Taok2 -/mice have increased brain size, behavioral defects, and impaired synapse development [51]; Kcdt13 +/mice show defects in hippocampal synaptic transmission and decreased dendritic complexity [46]; Mapk3 +/mice show behavior anomalies, abnormal synapse function, and reduced cell proliferation during development [75,76]; and Mvp +/mice show decreased plasticity and synaptic defects in ocular neurons [49] (Fig 2B). These models of individual genes do not fully recapitulate the phenotypes observed in models of the entire CNV [69][70][71][72][73]. For example, the decreased body weight, abnormal brain morphology, and coordination defects observed in 16p11.2 deletion mouse models have not been observed in any individual gene knockdown models [69][70][71][72] (Fig 2B). Similarly, Otud7a +/mouse models have low body weight, reduced vocalization, abnormal dendritic spine morphology, and seizures, but the 15q13.3 deletion mice also show learning and memory defects in addition to the above features [44,45,88]. Furthermore, mouse models for Chrna7 +/-, another candidate gene on chromosome 15q13.3, only show subtle behavioral phenotypes [89]. These data suggest that haploinsufficiency of CHRNA7 or OTUD7A alone is not sufficient to account for the pathogenicity of the entire CNV. Overall, a catalog of functional data from mouse [90], zebrafish [91], and fruit fly studies [92] indicates that 80% (131/163) of homologs for genes within CNV Percentile-rank scores compared to the whole genome for intolerance to variation (RVIS, pLI, and maximum CCR) and haploinsufficiency (HI, Essentiality, GHIS, and EpiScore) metrics for genes within select variably expressive CNV regions are shown [61][62][63][64][65][66][67]. Lower percentile scores indicate a gene is more likely to be haploinsufficient or intolerant to variation. Gray boxes indicate that metrics were not available for a particular gene. (B) Developmental phenotypes in animal models for homologs of individual genes within the 16p11.2 region, as cataloged from animal model databases (MGI, ZFIN, and FlyBase), are shown. Black boxes indicate presence of phenotype, white boxes indicate absence of phenotype, and gray boxes indicate that no homolog is present for a particular gene in that model regions present lethality, behavioral, developmental, or neuronal phenotypes when disrupted (S4 Table). These data suggest that disruption of multiple genes within each CNV region can affect important developmental or neuronal functions that could contribute to the phenotypes of the entire CNV.
Third, patterns of gene expression in humans and model organisms have identified multiple genes within each CNV region that are co-expressed in the developing brain along with known neurodevelopmental genes. For example, Maynard and colleagues examined expression patterns of 22q11.2 gene homologs in the developing mouse brain and found that 27 out of 32 genes were expressed in the embryonic forebrain, with six genes expressed in neuronal tissues related to schizophrenia [40]. In fact, a genome-wide weighted gene correlation network analysis (WGCNA) [93] from different brain tissues during development [94] shows several large modules of genes with similar expression patterns (Fig 3, S5 Table). For example, the five largest modules are each enriched (p < 0.05 with Benjamini-Hochberg correction) for biological functions related to neurodevelopment, including protein modification and transport in module 1 (M1), nervous system development in M2, and cell communication and signal transduction in M5. Each of these modules contains multiple genes from the same CNV region, including 3q29 genes PAK2, NCBP2, and BDH1 in M1, 1q21.1 genes BCL9, CHD1L, organism. The phenotypes observed in 16p11.2 deletion and duplication mice are distinct from those observed in the individual gene models [69][70][71][72][73]. Data for this figure, including gene metrics and animal phenotypes for other CNV genes not shown in this figure, are available in S3 Table and S4 Table. CCR, constrained coding region; CNV, copy-number variant; GHIS, genome-wide haploinsufficiency score; HI, haploinsufficiency score; MGI, Mouse Genome Informatics; pLI, probability of loss-of-function intolerance; RVIS, residual variance to intolerance score; ZFIN, Zebrafish Information Network.
https://doi.org/10.1371/journal.pgen.1007879.g002 CNV gene co-expression in the developing brain. Modules of co-expressed genes derived from WGCNA analysis of BrainSpan Atlas RNA-Seq data (Gencode v. 10) [94] across 524 tissues and time points the developing brain are shown. Networks of interactions among genes within three select top WGCNA modules (M1, M2, and M5) were obtained from the BioGrid interaction database [95] and visualized using Cytoscape [96]. Genes within variably expressive CNV regions are highlighted as colored nodes in each network. Bar graphs show enrichment (p < 0.05 with Benjamini-Hochsberg correction, represented by red dotted line) of genes within each module for GO Biological Process terms, calculated using PantherDB [97]. Data for this figure are available in S5 and FMO5 in M2, and 16p11.2 genes MVP and QPRT in M5. Therefore, it is clear that multiple genes in the same CNV region are co-expressed with each other in the developing brain and could share similar functions or regulatory patterns.

Dissecting the genetic complexity of CNV pathogenicity
Several scenarios could explain how the haploinsufficiency of multiple genes can predict the variable phenotypes associated with the entire CNV ( Fig 4A). The simplest such model is an additive model, in which disruption of individual genes within a CNV may only impart a mild phenotype on their own but additively contribute to more severe features [37] (Fig 4A). However, an additive model may not always explain the phenotypic features manifested by CNVs containing multiple candidate genes that could lead to severe defects or lethality on their own. For example, heterozygous Tbx1 +/-(within the 22q11.2 region) and Mapk1 +/-(within the distal 22q11.2 region) mice both lead to perinatal or neonatal lethality [98][99][100]. In humans, 14% (24/172) of CNV genes are under evolutionary constraint in control populations (pLI score > 0.9 or maximum CCR score greater than the 99th percentile) and have no reported diseaseassociated variants [101][102][103], suggesting that these genes could be under strong purifying selection [67]. Furthermore, 18% (22/125) of CNV genes show evolutionary constraint for These models include (i) a single-gene model in which one gene is sufficient to account for the phenotype; additive models in which the phenotype is due to the additive effects of multiple CNV genes that (ii) may or (iii) may not account for phenotypes on their own; and (iv) a complex interaction model in which additive, enhancer, and suppressor interactions between genes in the CNV region modulate the phenotype, including when additive effects of the CNV genes would lead to lethality on their own (gray circles). The size of the circles in the plot indicates the relative contribution of each gene to the overall neurodevelopmental phenotype. Thick circles indicate genes that contribute to the observed phenotypes on their own, and connector lines indicate the nature of interaction between pairs of genes. Connected modifier genes (M) can further modulate these interactions to ultimately define the phenotypic trajectory in individuals carrying the CNV. (B) For a hypothetical CNV region with three genes, there are seven combinations of gene knockdowns (A, B, C, AB, AC, BC, and ABC) that can be tested for the presence or absence of a specific phenotype. These knockdown experiments can yield 128 potential outcomes for each phenotype tested, with each individual set of outcomes corresponding to 1 of 64 combinations of pairwise gene interactions (additive, enhancer, suppressor, or no interaction). One possible outcome, highlighted in orange, shows presence of a particular phenotype for knockdowns of single genes A and B and two-hit knockdowns AB and BC. The single-gene knockdowns indicate that only genes A and B contribute to the phenotype and that the phenotype of pairwise knockdown AB is due to the additive effects of the two genes. Although the phenotype is observed for BC knockdown, the phenotype is not observed for AC and ABC knockdowns, suggesting that gene C suppresses the phenotype of gene A. CNV, copy-number variant. loss-of-function mutations (pLI > 0.9) but not for copy-number changes within a control population [104]. We therefore hypothesize that the pathogenicity of variably expressive CNVs can also be explained by complex interactions among the constituent genes within shared biological pathways. These interactions can enhance or suppress the phenotypes caused by disruption of individual genes. Under this model, the haploinsufficiency of certain genes can be modulated by haploinsufficiency of other interacting genes in the same region that may or may not lead to phenotypes on their own (Fig 4A). Furthermore, variants in the genetic background that map within these shared pathways can simultaneously modulate the effects of multiple genes, ultimately defining the phenotypic trajectory in CNV carriers (Fig 4A). For example, Pizzo and colleagues found that the burden of rare deleterious mutations within genes in the genetic background correlated with variability of IQ scores and head circumference among 16p11.2 deletion carriers [36]. The potential for complex interactions within a CNV region depends on the functional convergence of the constituent genes. For instance, both KCTD13 and TAOK2 within 16p11.2 participate in the Ras homolog A (RhoA) signaling pathway [46,51] and therefore are more likely to interact with each other than genes located in different biological pathways. In fact, it has been shown that genes within pathogenic CNVs are more similar in function compared to genes within benign CNVs, suggesting that variably expressive CNVs are likely to contain interactions between functionally relevant genes [105]. Further, Noh and colleagues found an overrepresentation of interactions among genes within autism-associated CNVs, and these interactions were enriched for synaptic transmission and regulatory signaling pathways [106]. Because of this, therapeutic targets for pathways shared among CNV genes could be explored as potential treatments for CNV disorders.
The possibility of additive, suppressor, and enhancer interactions between pairs of genes underlies the potential for highly complex models of CNV pathogenicity. For instance, within a CNV region spanning three genes, seven combinations of gene knockdown experiments (haploinsufficiency of A, B, C, AB, BC, AC, and ABC) can be tested for the presence or absence of a specific phenotype (Fig 4B). This set of knockdown experiments can yield 128 possible experimental outcomes that can be used to further deduce 64 possible sets of pairwise interactions for AB, BC, and AC (no interaction, additive, suppression, or enhancement for each interaction) (Fig 4B). These possible combinations of interactions exponentially increase for larger CNVs with more genes, and the complexity further increases if quantitative phenotypes are used to determine the magnitude of interactions between genes or when interactions with variants in the genetic background are taken into account. However, testing even a small number of these interactions would still uncover the nature of the relationships among genes within a CNV region and potentially a common pathway shared by those genes. For example, Grice and colleagues used D. melanogaster RNAi models to identify 6 synergistic interactions out of 41 tested pairwise interactions between genes within de novo CNVs from autism patients, including partial 3q29 and 22q11.2 deletions [107]. Iyer and colleagues also used fly models to identify 24 additive, enhancer, and suppressor interactions out of 52 tested pairwise interactions among homologs of 16p11.2 genes [47], providing further evidence for complex interactions within CNV regions. Furthermore, these interaction models for CNV pathogenicity can be tested in cellular models of the entire CNV. For example, a more severe phenotype observed by restoring dosage of a candidate gene would suggest that disruption of this gene potentially suppresses the effects of other genes within the CNV.

Complex genetic interactions in the context of genome sequencing
In recent years, exome and whole-genome sequencing analysis has proven invaluable in identifying candidate genes for neurodevelopmental disorders [108]. However, sequencing studies would not be able to capture the genetic complexity of a multigenic CNV region. For example, genes that cause severe phenotypes or lethality on their own and are modulated by haploinsufficiency of other interacting genes within a CNV are less likely to have an enrichment of mutations in sequencing studies. Furthermore, because of the strong phenotypic heterogeneity of these CNVs, it is not possible to determine whether the phenotypes of any individual candidate gene fully recapitulate the variable phenotypes of the entire CNV region. Candidate genes within CNVs identified through genome sequencing studies, such as TAOK2 on chromosome 16p11. 2 [51] or CHRNA7 on chromosome 15q13.3 [109], do not preclude the possibility of other candidate genes in the same region. Because of this, a thorough systems-based approach for each gene within a CNV and its interactions is necessary to identify candidate genes responsible for the neuropsychiatric features of each region [110].
In summary, genomic and functional data have implicated multiple genes in variably expressive CNV regions toward neuropsychiatric phenotypes, suggesting that single causative genes are not responsible for the heterogeneous features of these CNVs. Here, we propose a complex interaction-based model for these CNVs, in which candidate genes within each region interact with each other to influence the variable clinical outcome. The CNV phenotype is therefore distinct from the phenotype manifested by any individual gene, or in some cases, the additive effects of all genes in the region. This multigenic model of CNVs agrees with a broader complex genetic view of neurodevelopmental disorders, in which hundreds of genes with varying effect sizes and complex interactions influence developmental features [10]. Further studies on the role of individual genes in CNV regions towards neurodevelopment, especially those that identify key interactions between genes, will be useful in uncovering the cellular pathways and mechanisms responsible for the observed neuropsychiatric features.
Supporting information S1