Copy Number Variation in Subjects with Major Depressive Disorder Who Attempted Suicide

Background Suicide is one of the top ten leading causes of death in North America and represents a major public health burden, partcularly for people with Major Depressive disorder (MD). Many studies have suggested that suicidal behavior runs in families, however, identification of genomic loci that drive this efffect remain to be identified. Methodology/Principal Findings Using subjects collected as part of STAR*D, we genotyped 189 subjects with MD with history of a suicide attempt and 1073 subjects with Major Depressive disorder that had never attempted suicide. Copy Number Variants (CNVs) were called in Birdsuite and analyzed in PLINK. We found a set of CNVs present in the suicide attempter group that were not present in in the non-attempter group including in SNTG2 and MACROD2 – two brain expressed genes previously linked to psychopathology; however, these results failed to reach genome-wide signifigance. Conclusions These data suggest potential CNVs to be investigated further in relation to suicide attempts in MD using large sample sizes.


Introduction
Major depressive disorder (MD) is a debilitating illness affecting approximately 5-15% of people in the United States [1], resulting in economic burdens such as lost work days [2] and decreased life expectancy in affected individuals [3]. Symptoms include lack or gain of sleep, weight gain/loss, feelings of hopelessness, depressed mood, and lack of motivation, amongst others, as defined both by the International Classification of Disease and the Diagnostic and Statistical Manual. Phenotypic heterogeneity has long been seen as a major confounding factor in genetic studies of MD [4], and suicide attempts represent a documentable outcome of some people with MD, possibly representing a sub-group of individuals with MD.
MD and suicide run in families, suggesting that they might partially be explained by genetics [5,6]. Still, while most studies of the genetics of MD and suicide show a high heritability estimate [7][8][9][10], it is unclear which genes may be driving the effect. Heritability studies led to candidate gene approaches in search of genetic variants that may be passed from one affected generation to another. Genes involved in biological pathways of antidepressant action (candidates such as 5HTR2A, MAO-A, and 5-HTT), were screened for variation that might associate with disease. This approach largely proved challenging [11], with low reproducibility across studies. To avoid searching for variation associated with disease in a priori genes, genome-wide approaches have been applied where hundreds of thousands of variants can be screened at once [12][13][14]. Another approach to identify genes of interest related to disease is to perform genome-wide searches for copy gains and losses [4,15], instead of investigations of single nucleotide polymorphisms. While these copy number variants (CNVs) are not intrinsically more pathogenic than a single nucleotide change, they are large and have the potential to increase or decrease gene product at each CNV that intersects a gene, or alter the genomic environment with potentially farreaching cis or trans effects.
The purpose of the current study was to examine and identify CNVs in people with MD who had attempted suicide and determine if these differ from MD cases that never attempted suicide. We reasoned that suicide attempters with MD might represent a genetically different group from people with MD who never attempted suicide. We used whole-genome SNP microarrays to call CNVs .100 Kb and then assessed CNV frequency differences between people with MD who either had or had not attempted suicide, as well as in a non-psychiatric control group.

Materials and Methods
All protocols and sample collections were approved by the IRB of the Massachusetts General Hospital and all data were analysed anonymously.
The STAR*D cohort [16] has been extensively used in many genetic studies and has been thoroughly documented [17]. Lifetime history of suicide attempts was assessed at the initial study visit by the study clinician and suicidal behavior was not exclusionary for the initial STAR*D patient recruitment, provided the patient did not require hospitalization [18].
Genotyping for the STAR*D cohort utilized the Affymetrix GeneChip Human Mapping 500 K Array Set and the Affymetrix Human SNP Array 5.0 [19]. Genotypes for samples run on the Affymetrix 500 K Array (n = 969) were called using the BRLMM algorithm, and those analyzed on Affymetrix Array 5.0 (n = 979) were called using the BRLMM-P algorithm. Additional QC was performed using PLINK [20], where individuals or SNPs were excluded with total call-rates ,95%, SNPs with call rates ,98%, individuals with minor allele frequencies ,1%, or out of Hardy-Weinberg equilibrium (p,161026). We imputed missing genotypes using MACH and retained SNPs with r2.0.8. Eleven subjects were excluded due to missing clinical data resulting in a final dataset of 1, 262.
CNVs were identified using Birdseye [21], which identifies CNVs by integrating intensity data from neighboring probes using a hidden Markov model (HMM) on a per-individual basis. Performance is dependent on a number of factors including SNP and copy number probe density, mean intra-individual probe variance and CNV frequency. For each CNV a LOD score was generated that describes the likelihood of the CNV relative to no CNV over the given interval. All CNV analysis were performed in PLINK and only those CNVs present in less than 10% of the total sample were used for analysis.
Secondary controls for the current study came from the Database of Genomic Variants (http://projects.tcag.ca/ variation/), a database of over 100, 000 CNVs from over 40 studies. While these studies do not explicitly screen for mood disorders, only controls from these studies are in the database.

Results
The STAR*D dataset comprised 1,262 individuals (483 males), where 189 cases attempted suicide while 1,073 did not attempt suicide. In all analyses, we clustered data using PLINK to account for population stratification which in this dataset comprised three groups (Caucasian, African-american, and Hispanic). In all cases, we assessed only those CNVs that were greater than 100 Kb. While admittedly a conservative number, CNV call accuracy increases proportionally to predicted CNV size. Utilization of this CNV size for analysis is consistent with previous work [22].
We first asked whether there was a difference in copy number burden between people with MD who attempted suicide compared to those that had never attempted suicide. We found no significant difference in CNV burden defined as CNV size, total Kb spanned, or proportion of CNVs/person, when assessing deletion or duplication CNVs (Tables 1 and 2).
Next, we asked whether there was an increased probability of a CNV intersecting a given gene between suicide attempters and non-attempters. To do this we assessed the number of CNVs in both groups that intersected any gene. Point tests were performed for each gene and two-sided p-values were generated comparing probability estimates permuting over the whole genome (Table 3). Presented in Table 3, we show all CNVs that differed between MD_SA (suicide attempt) and MD_NO SA (no suicide attempt) at single point p-value,0.1. We observed no genome-wide significant hits; however, we note that more CNVs that intersected genes were present in the MD_SA group than in the MD_NO SA group, suggesting that MD_SA may be a more severe phenotype than people with MD_NO SA. All CNVs intersecting genes were duplications, except for MACROD2, a gene previously linked to Autism [23]. We also analyzed these data using genome-wide correction, by CNV type (statistics generated separately for deletions and duplications - Table 4).
To determine if CNVs that intersected genes might be pathogenic, we screened the database of genomic variants to determine if any control subjects had CNVs that intersected any of these genes, matched for CNV type (i.e., deletion or duplication). All CNVs that intersect genes identified in the MD_SA population have been previously reported.
To determine if any regions of the genome had differences in CNV number, irrespective of whether they intersected genes, between MD_SA and MD_NO SA, we performed an identical analysis as with those CNVs that intersect genes; however, we found no significant differences in any genomic regions in CNV number (Table 5).

Discussion
We performed an analysis of copy number variation (CNV) in people with Major Depressive disorder (MD) who had previously attempted suicide and compared CNVs from this population to people with MD that had never made a suicide attempt. Our results suggest that no CNV distinguishes these two groups, and that if a particular CNV is associated with suicide attempts in MD, it would likely be a common CNV. That is, we did not find any CNVs not reported in the Database of Genomic Variants, suggesting that no copy changes influence suicide attempt status in the STAR*D sample. Why didn't we detect a difference in CNV frequency or CNV burden between groups? While our study is large compared to most studies performed in psychiatric genetics, it was likely underpowered for the current analysis. The best CNV differences that we detected were 2:0, which might suggest that a sample size 4-5 times larger might be able to detect an effect; however, given that all identified CNVs are present in greater than 1% of the general population, it is likely that merely increasing the sample size will also identify controls with similar CNVs. This suggests that CNVs do not contribute to suicide attempts in Major Depression, at least in the STAR*D sample, though it is possible that the presence of a common CNV in combination with a particular genetic background and environment increases risk. To detect an effect of a common CNV, sample sizes would need to be increased .20-30-fold over the current study design, at least following the analysis model employed in the here. Another explanation for the lack of significant results in this study is that despite using a well-annotated sample, the attempter and no attempter groups are still heterogeneous. Attempt status was determined by a single report about suicide history during enrollment for STAR*D -it may be that there is large variation in how subjects report suicide history. Future CNV studies in suicide may want to consider using comprehensive questionnaires about suicide history. For psychiatric genetics, this raises the interesting question of how homogenous a sample needs to be before attempting to find genetic variation associated with disease. For example, a study with a similar study design to the current one might use only young adults with major depression, separating case and control based on severity of suicide attempts, number of suicide attempts, and/or complete absence of suicide attempts. Statistics become challenging when these issues are addressed, but this in combination with very large sample sizes from ethnically   and socio-economically homogenous groups may be what is required to identify genetic variation relevant to psychopathology. Finally, we note the technology for calling CNVs was lower resolution than could have been used. Specifically, we called CNVs from SNP genotyping arrays using very conservative calling criteria, which increased the false negative rate. For example, there may be smaller CNVs (CNVs less than 100 Kb were screened out in this study) that show a significant difference between attempters and non-attempters, or there may have been CNVs that did not meet signal intensity thresholds. In either case, the data analyzed was high quality but likely did not detect all CNVs present in the STAR*D sample. Utilization of Whole Genome Sequencing, array-Comparative Genomic Hybridization, or 1 M SNP arrays would have given better resolution for CNV detection.
The current study used conservative filtering criteria for CNV analysis and stringent QC measures for array analysis. We also took advantage of a well-documented sample set (STAR*D) where many other studies have also been performed, potentially allowing for further downstream analyses with other data generated from this sample.