Ontogenetic De Novo Copy Number Variations (CNVs) as a Source of Genetic Individuality: Studies on Two Families with MZD Twins for Schizophrenia

Genetic individuality is the foundation of personalized medicine, yet its determinants are currently poorly understood. One issue is the difference between monozygotic twins that are assumed identical and have been extensively used in genetic studies for decades [1]. Here, we report genome-wide alterations in two nuclear families each with a pair of monozygotic twins discordant for schizophrenia evaluated by the Affymetrix 6.0 human SNP array. The data analysis includes characterization of copy number variations (CNVs) and single nucleotide polymorphism (SNPs). The results have identified genomic differences between twin pairs and a set of new provisional schizophrenia genes. Samples were found to have between 35 and 65 CNVs per individual. The majority of CNVs (∼80%) represented gains. In addition, ∼10% of the CNVs were de novo (not present in parents), of these, 30% arose during parental meiosis and 70% arose during developmental mitosis. We also observed SNPs in the twins that were absent from both parents. These constituted 0.12% of all SNPs seen in the twins. In 65% of cases these SNPs arose during meiosis compared to 35% during mitosis. The developmental mitotic origin of most CNVs that may lead to MZ twin discordance may also cause tissue differences within individuals during a single pregnancy and generate a high frequency of mosaics in the population. The results argue for enduring genome-wide changes during cellular transmission, often ignored in most genetic analyses.


Introduction
Genome-wide constancy and change underlies evolution and familial inheritance but remains ill-defined. An assessment of changes as the genome is passed on from one generation (meiosis) and developmental cycle (mitosis) to the next is needed. It directly contributes to the sum of genetic individuality. At present, these inquiries are difficult [2], and require the development of new quantitative methods to assess genome-wide changes and their significance. This report assesses two common measures of genomic variation: copy number variations and (CNVs) and single nucleotide polymorphism (SNPs) across a generation and between monozygotic twins in two exceptional families. The results offer novel insight into meiotic and mitotic sources of variation, which results in genetic individuality between MZ twins. This individuality may account for discordance in monozygotic twins for a variety of diseases including schizophrenia.
CNVs are structural variants that are both frequent and relevant and may range in size in humans from 1 Kb to several Mb [3]. Given their impact on physiology and function, CNVs have a major influence on evolution and gene expression and on normal and disease related variation [3]. CNVs include duplications and deletions leading to a departure from the classic view that all autosomal genes are present in two copies, with one allele inherited from each parent. The majority of CNVs are copy number polymorphisms (CNPs), existing in a frequency that is greater than 1% and transmitted across generations. However, a small proportion of CNVs are novel events. CNVs may account for a major fraction (,12%) of the genome, but appear to concentrate in some genomic regions depending on the sequence features [4,5]. Unlike CNVs, SNPs are relatively small changes, usually involving replacement of a nucleotide with another. SNPs are common and distributed across the entire human genome. Individual SNPs mark a unique genomic location, and are usually neutral in nature. In other cases, they may change amino-acids, cause protein truncation or affect expression. They are easily detected, and have been extensively exploited in genetic analysis including the cloning of disease causing genes, individual identification and establishment of genetic relatedness.
Studies on these two genome-wide variations (SNPs and CNVs) have greatly enhanced our understanding of evolution and genetic individuality. They are also helping to elucidate the cause of genetic, and genomic disorders including schizophrenia [6]. A number of SNPs appear to be linked to this complex neuro-developmental disease, which has a heritability estimate of 80%. However, results of linkage studies have not been consistently reproducible [7,8].
Individuals affected with schizophrenia (SCZ) have shown an elevated incidence of CNVs [9] and a few rare CNVs appear to have a major effect on the development of SCZ [10]. However, these CNVs account for only a small fraction of schizophrenia cases [11] and the challenge of identifying common genetic cause(s) of SCZ remains. The search for genes in SCZ currently relies on large number of patients and matched controls. The limited progress using these approaches emphasizes the need to pursue alternative approaches. Future studies may benefit from inclusion of two features. The first is a genome-wide comparison of the parents and their progeny affected by SCZ and the second is the assessment of genomes of monozygotic twins (that show ,52% discordance for SCZ) [12,13]. The current study reports genome-wide CNV and SNP results on two exceptional families that include monozygotic twins discordant for schizophrenia ( Figure 1, Table 1).

Familial Distribution of CNVs
The number of CNVs per individual ranged from 35 to 65, with the exception of one individual who is described more fully later (Table 2). This is similar to the number of CNVs per subject reported from most other studies that have used Affymetrix 6.0 Human SNP arrays [14]. The range is also comparable with the number of CNVs found in Venter's genome (62) based on his complete genome sequence [15]. The exception in our study was the father in family 2 (II-1-1) who was found to harbour a rare chromosome 13q deletion containing 40 CNVs at a single genomic location. Although this finding is beyond the scope of this report, it is important to note that II-1-1 underwent chemotherapy treatment and that the samples utilized in this study were obtained towards the end of that treatment. Most CNVs identified were in the range of 100 to 200 Kb, consistent with the size distribution of CNVs reported in the literature [14]. The majority of CNVs observed (Table 3) were copy number gains (78.5%) and ,10% of the CNVs identified are not listed in the Database of Genomic Variants (http://projects.tcag.ca/variation/ ) accessed on 8.2.2010. Further, the chromosomal distribution of CNVs was comparable across individuals with the exception of the father in family 2 who had consistently higher CNVs affecting most chromosomes (Table 4). Of the CNVs identified, .50 per cent overlapped RefSeq genes. The identified genes are frequently associated with metabolic pathways such as starch and sucrose metabolism as well as pathways involved in the metabolism of amino acids, for example, , phenylalanine, histidine and tyrosine (AMY2A,AMY1A,ALDH1L1,PSMC1). Structurally, .67% of the CNVs identified were flanked at both the 59and 39 end or at just the 59 (.7%) or 39 (.8%) end with a set of common repeats, represented by short interspersed nucleotide elements (SINEs), long interspersed nucleotide elements (LINEs), long terminal repeats (LTRs) and low copy repeats (LCRs) near the breakpoints. The majority of the deletion breakpoints had 1-30 bp of microhomology, whereas a small fraction of deletion breakpoints contained inserted sequences. The co-occurrence of microhomology and inserted sequence suggests that both recombination and replication based mutational mechanisms are operational in CNV generation. Recent studies have identified short DNA motifs that both determine the location of meiotic crossover hotspots and are significantly enriched at the breakpoints of recurrent non-allelic homologous recombination (NAHR) syndromes [16]. We found evidence for this mechanism in a subset of the breakpoint events (data not shown). This was true for the de novo ( Figure 2a) as well as inherited (Figure 2b) CNVs. Such sequences may represent genomic architecture that is prone to genome instability by a predisposition to genomic rearrangements via non-homologous end joining (NHEJ), template switching and/or non-allelic homologous recombination (NAHR).

Familial vs de novo Origin of CNVs
A novel feature of the data included in this report is that we are able to classify observed CNVs into two groups based on their  absence or presence in one of the parents. CNVs that were found in one or both twins and not seen in either parent, were classified as de novo. If a de novo CNV was present in both twins, it was considered to have originated during parental meiosis and when present in only one of the two twins, it was assumed to have originated in mitosis during development. This classification allowed us to identify 14 and 26 de novo CNVs in family 1 ( Table 5) and family 2 ( Table 6) respectively. The table includes genomic locations as well as individual specific break points which allow for the assessment of regions of overlap with the Database of Genomic Variants (Toronto, Ontario). Mitotic origin of CNVs was ,3 times higher than CNVs generated during parental meiosis. Of the mitotic de novo CNVs identified two (loss at 14q32.11 as well as loss at 8q11.21) were specific to the schizophrenia patient in family 1 and one (gain at 19q13.41) was specific to the patient in family 2. Such results are novel in the literature. Further, it is enticing to ask the question, do the genes disturbed by CNVs contribute to the development of their disease symptoms? Although the answers to such questions are of paramount importance, the results available do not offer a direct assessment of such questions. Nonetheless, it is appropriate to entertain the discussion that the known features of these genes are or are not compatible with disturbances observed in schizophrenia, which is discussed below.

De novo CNVs and Schizophrenia
The genes overlapping disease specific de novo CNVs in family 1 included PSMC1 (proteasome 26S subunit, ATPase, 1) and C14orf102 (chromosome 14 open reading frame 102 gene) on 14q32.11 and KIAA0146 on 8q11.21. PSMC1 (MIM 602706) is an ATP-dependent protease [17] that may include protein ubiquitination in response to DNA damage [18]. It is composed of a 20S catalytic proteasome and 2 PA700 regulatory modules and contains an AAA (ATPases associated with diverse cellular activities) domain [17]. The human and mouse proteins are 99% identical [19] and may play a significant role in ubiquitinmediated proteasomal proteolysis in the molecular pathogenesis of neurological diseases such as spinocerebellar ataxia type 7 (SCA7). Also, several studies (for review, see [20,21]), have indicated that the genes related to ubiquitination are altered in the brains of patients with schizophrenia. Further, this CNV also affects another gene (C14orf102; chromosome 14 open reading frame 102) which is conserved across phyla and highly expressed in the brain    (Affymetrix GNF Expression Atlas 2 Data). The other CNV affected in this patient of family 1 represents a loss at 8q11.21, that contains the still uncharacterized gene, KIAA0146, which is expressed in the brain, may contain a CAG repeat and is conserved in chimpanzee, dog, cow, mouse, rat, chicken, and zebra fish. It is a transcription factor with CCAAT enhancer binding protein (CEBP) function [22]. Further the gene is highly expressed in the brain and hippocampus that may implicate it in mental disorders (www.genecards.org). Although we cannot rule out a role for these three genes (PSMC1, C14orf102 and KIAA0146) in schizophrenia, such conclusions would be premature. Only a follow up study will establish if any of the three genes directly contribute to the development of schizophrenia in the patient from family 1. A similar analysis of CNVs in family 2 has identified a 109 kb gain at 19q13.41 that is specific to the schizophrenia patient in family 2. Translocations involving 19q13 are a frequent finding in follicular adenomas of the thyroid and may represent the most frequent type of structural aberration in human epithelial tumors [23]. The CNV identified in this region contains two genes; DPRX1 and ZNF331. DPRX1 (divergent-paired related homeobox) is a member of the DPRX homeobox gene family, contains a single conserved homeodomain and may function as a putative transcription factor. It may bind a promoter or enhancer sequence or interact with a DNA binding transcription factor and is involved in early embryonic development and cell differentiation [24]. The drosophila homologue of the DPRX1 gene (dPrx5;    Drosophila peroxiredoxin 5) confers protection against oxidative stress, apoptosis and also promotes longevity [25]. The next gene, ZNF331(zinc finger protein 331) affected by this CNV is also involved in DNA-dependent regulation of transcription as a transcriptional repressor [26]. Interestingly, it is one of the imprinted genes that exhibits monoallelic expression in a parentof-origin specific manner [27]. Imprinted genes are important for development and behaviour and disruption of their expression is associated with many human disorders [28]. In conclusion the three genes affected in the schizophrenia patient in family 1 (PSMC1, C14orf102, KIAA0146) and the two genes affected in the patient of family 2 (DPRX1 and ZNF331) could not be excluded from their potential involvement in the development of schizophrenia in the two patients. If applicable, the biological systems affected in the two patients is hypothesized to be different. The patient in family one is hypothesized to have a ubiquitin-mediated proteasomal proteolysis while the patient of family 2 could have errors in regulatory mechanisms affecting gene regulation. Such conclusions must remain hypothetical until proven by independent supporting evidence.

De novo changes may lead to mosaicism
The genotypes generated by the Affymetrix 6.0 array have also allowed us to establish that ,0.12% (1086 and 1022 in twin pair 1 and 2 respectively; 11 substitutions shared by both pairs) of the SNPs in the twins represented de novo substitutions, but unlike CNVs, (that primarily originated during ontogeny in mitosis) most (63-65%) originated during parental meiosis. These results suggest that DNA replication fidelity at the level of single base pairs (SNPs) vs replication forks (CNVs) is differentially exercised during meiosis and mitosis. The single base pairing is much more stringent in mitosis (evolved to produce identical daughter cells), compared to meiosis where errors can facilitate potentially beneficial variations. In contrast, CNVs which affect the phenotype may be advantageous when occurring during mitosis and selected for during development. Thus, cell type specific CNVs may play a role in growth and development, offering advantageous variability. This would mean that most individuals are mosaics [29]: a hypothesis that is difficult to assess and evaluate. It is likely that the ratio of mosaic cells may be maintained throughout the differentiated (ectoderm, mesoderm, endoderm, etc) tissues over the lifetime [30,31]; an exception being when other factors are directly influencing DNA stability. Such a mechanism may generate genomic differences and differential mosaicism in most or all individuals. If this is the case, it will complicate traditional genetic analysis that assumes stability of the genome with rare exceptions.
We have been able to establish genome-wide (CNVs and SNPs) discordance for MZ twin pairs. Also, given that the twins are discordant for schizophrenia, it is possible to assign provisional CNVs (and genes) as well as substitutions (SNPs) that may be associated with the disease status of the affected twins in family 1 and family 2 ( Table 7, 8). Similarly, we identified substitutions (SNPs) that were different between the affected and unaffected member of the two sets of twins including their distribution along the chromosomes, introns and exons and the predicted effect on the gene product. Identity of de novo CNVs found in Family 1 ( Table 5) and Family 2 ( Table 6) and the gene regions which they overlap was reported. De novo CNVs are defined as those that are present in either twin but not found in parents. In the tables, SD indicates the percentage of overlap between segmental duplications and the CNVs, '0' means there is no overlap between CNV and segmental duplication and '1' means 90-100% overlap.  Table 7. Cont. Identity of inherited CNVs found in Family 1 (7), Family 2 (8) and the gene regions which they overlap. Inherited CNVs are those which are present in either or both parents and transmitted to either or both twins. All size is in kb. SD indicates the percentage of overlap between segmental duplications and CNVs. '0' means there is no overlap between CNV and segmental duplication, '1' means 90-100% and '2' means 50-90% overlap. Parental CNVs not transmitted to offspring were not included in Table 5-8 so the total number of CNVs present in Table 2-4 was not same as  Table 8. Cont.
We also analyzed genes that overlapped de novo CNVs (gains and losses) in order to assess their potential effect on physiology and function starting with GO ontology annotation (http://www. geneontology.org). Interestingly, the majority of genes belonged to transcription, DNA replication, transport, and cell signalling pathways, including 'binding' or 'catalytic' functions. A number of these genes are expressed in the brain, some with potential to affect neurophysiology, neurodevelopment and function and a set of them are known to show altered expression in schizophrenia (www.schizophreniaforum.org). Also of significance is the observation that the FAM19A5 protein encoded by the FAM19A5 gene (22q13.32) belongs to the TAFA protein family which are predominantly expressed in the brain, and are postulated to function as brain-specific chemokines or neurokines, that act as regulators of immune and nervous cells [32]. This finding adds to the existing speculation about the role of the Major Histocompatability Loci (MHC) and infection in SCZ. Functional analysis of this gene and upstream regulatory elements for characteristic patterns of nucleosome occupancy changes associated with enhancers could yield novel insights into the role of this gene in psychiatric disorders. IPA analysis of gene networks of CNVs and SNPs converged on cell cycle, cellular growth and proliferation. Genes involved in genetic disorders such as hematological disease, immunological, inflammatory and developmental disorders were overrepresented. These results support the hypothesis that schizophrenia is a ''developmental disorder'' at the molecular level. Interestingly, a recent co-expression network analysis of microarray-based brain gene expression data revealed perturbations in developmental processes in schizophrenia [33]. However, given that these results are based on only two twin pairs, and schizophrenia is highly heterogeneous, the results on disease causations cannot be generalized. Also, we have offered other explanations for twin discordance that may involve epigenetic changes [34].
It is not surprising that genomic studies have begun to use monozygotic twins. In fact a number of them have identified copy number variations [35] and epigenetic [36][37][38] differences between them; an exception to these results is a recent study by Baranzini et al [39]. They studied three pairs of monozygotic twins discordant for Multiple Sclerosis (MS) and found no difference that could account for the disease causation. The results may be viewed as not surprising for a number of reasons. First, MS is known to have significant environmental components including sunlight and viruses, among others, [40] and the concordance rate in monozygotic twins is only ,30%. Second, they assessed the CD4+ lymphocytes only that may or may not represent the causative cell type. Also, they sequenced the genome of CD4+ cells from a single pair corresponding to 21.7 and 22.5-fold coverage representing 99.6% and 99.5% of the NCBI human reference genome, which may or may not be effective. Only additional genomic and epigenomic studies on MZ twins will offer insights into the dynamics of genomic stability and change, that forms the focus of this report.
In summary, the present study adds to the recent effort in human genetics to define the phenomenon of constancy and change using inheritance and origin of genome-wide CNVs and SNPs. The results demonstrate that CNVs often result from mitosis during early development facilitated by flanking repeats. They may lead to CNV differences among different tissue and make most individuals mosaics. The described approach expands the search for disease related genetic changes, indicates the time of their occurrence and begins to interrogate the mechanisms involved.

Materials and Methods
This research was approved by the Committee on Research Involving Human Subjects at the University of Western Ontario. The families and patients were identified, recruited and clinically assessed by Dr. Richard O'Reilly (Psychiatrist) and all participants ( Figure 1) gave informed consent and provided blood and buccal cells for this research. All subjects were interviewed using the Structured Clinical Interview for DSM IV and the SCID II (for personality disorders) and their medical records collected and reviewed. Diagnoses and demographic information are listed in Table 1. DNA was extracted from the collected white blood cells using the perfect pure DNA blood kit (5prime.com) following the manufacturer's protocol. Subsequent microarray analysis was performed using the Affymetrix Genome-Wide Human SNP Array 6.0 at the London Regional Genomics Centre (LRGC) following manufacturer's protocol and stringent quality control measures. Briefly, 5 mg of genomic DNA was labelled and hybridized to Affymetrix SNP 6.0 arrays. CNVs called by both Affymetrix Genotyping Console 4.0 and PartekH Genotyping Suite TM software suites were retained for analysis. In both cases, the CNVs were identified by continuity of markers on a segment. Two CNVs that overlapped by .50% in the two methods of data analysis were given the same identity. Every measure was undertaken to avoid inclusion of false positives including correction for segmental duplications. We found evidence of CNVs associated with segmental duplications which agrees with previous studies [41]. The CNVs identified were further assessed by comparison to the Database of Genomic Variants (http:// projects.tcag.ca/variation/) and annotated with gene symbols by importing the annotation file from the UCSC genome browser (NCBI36/hg 18). A CNV that was present in both members of the twin pair and not in either of their two parents was considered to be meiotic de novo (originated during gamete formation), while a CNV that was present in one of the two twins and not present in either parent was considered to be mitotic de novo (originated during development). Further, a CNV present in the SCZ affected twin only (as compared to the two parents and unaffected member of the pair or the database) was classified as ''provisional de novo CNV'' for this disease. Novel CNVs discovered in this study were validated for predicted CNVs by Real Time PCR analysis with an internal control (RNAseP gene) using TaqMan detection chemistry and the ABI Prism 7300 Sequence Detection System (Applied Biosystems, http://www.appliedbiosystems.org). The copy number of the test locus in each case was defined as 2T 2DDC where DCT is the difference in threshold cycle number for the test and reference loci.
Additional CNV analysis focused on two aspects. The first deals with identification of putative repeat elements in the flanking regions of CNVs; within a 1 kb region upstream and downstream of the CNV breakpoint which could promote breakage, deletion and duplication. The identification of repeat elements was carried out using repeat masker (http://www.repeatmasker.org/). Secondly, a probable mechanism associated with sequence-specific susceptibility to CNVs was queried. This data was used to test models related to the origin of CNVs. Previously reported candidates for CNV mechanisms include Non-Allelic Homologous Recombination (NAHR), Non-Homologous End Joining (NHEJ), Fork Stalling and Template Switching (FoSTeS) and Microhomology-Mediated Break-Induced Replication (MMBIR) [42]. The second line of investigation involved functional characterization of genes by matching of the identified genes with the Schizophrenia Gene Database (http://www.schizophreniaforum.org/res/ sczgene/default.asp) as well as their assessment by GO ontology (http://www.geneontology.org/). The genes identified were also subjected to IPA analysis (www.ingenuity.com) that identified the nature of gene interactions and the pathways involved.
The use of Affymetrix 6.0 Human SNP array also allowed us to assess the transmission of a total of 909622 SNPs that are contained on the array. It allowed us to identify SNPs in the twins that were not present in either of the two parents; considered to be de novo. The origin of the de novo SNPs was assumed to be parental meiosis if both twins carried the novel nucleotide. In contrast, the origin of the de novo SNPs was assumed to be somatic development (mitosis) if only one of the two twins carried the novel nucleotide. We were able to assign novel substitutions to different categories including their potential effect on the gene and gene product, as well as pathways that may be affected.