Copy number variation study in Japanese quail associated with stress related traits using whole genome re-sequencing data

Copy number variation (CNV) is a major driving factor for genetic variation and phenotypic diversity in animals. To detect CNVs and understand genetic components underlying stress related traits, we performed whole genome re-sequencing of pooled DNA samples of 20 birds each from High Stress (HS) and Low Stress (LS) Japanese quail lines using Illumina HiSeq 2×150 bp paired end method. Sequencing data were aligned to the quail genome and CNVnator was used to detect CNVs in the aligned data sets. The depth of coverage for the data reached to 41.4x and 42.6x for HS and LS birds, respectively. We identified 262 and 168 CNV regions affecting 1.6 and 1.9% of the reference genome that completely overlapped 454 and 493 unique genes in HS and LS birds, respectively. Ingenuity pathway analysis showed that the CNV genes were significantly enriched to phospholipase C signaling, neuregulin signaling, reelin signaling in neurons, endocrine and nervous development, humoral immune response, and carbohydrate and amino acid metabolisms in HS birds, whereas CNV genes in LS birds were enriched in cell-mediated immune response, and protein and lipid metabolisms. These findings suggest CNV genes identified in HS and LS birds could be candidate markers responsible for stress responses in birds.


Introduction
Understanding the evolutionary process that leads to divergence in animals requires study of their genetic variation. Genomic variation is a principal factor responsible for phenotypic diversity in animals [1]. Basically, genomic variation can encompass a wide range of alterations from small indels to sometimes deletion or duplication of the entire genome. The deletion or duplication of a certain fragment of DNA causes change in copy number variation (CNV) of genome [2]. CNV is arbitrarily defined as DNA segment that is 1 kb or larger and present at variable copy number in comparison to a reference genome [3]. It is estimated that DNA region that have CNV can account for 4.8-9.5% of human genome and surpass the diversity caused by single nucleotide polymorphisms [4,5]. PLOS  have detected major differences in CNV among genes that might potentially contribute to genetic differences and phenotypic divergence in HS and LS lines of Japanese quail.

Ethics statement
This study was conducted following the recommended guidelines for the care and use of laboratory animals for the National Institutes of Health. All procedures for animal care were performed according to the animal use protocols that were reviewed and approved by the University of Arkansas Institutional Animal Care and Use Committee (IACUC Protocol #14012).

Birds and DNA sequencing
The early process of development and selection of HS and LS lines of Japanese quail for their plasma corticosterone response to immobilization for up to 12 generations was explained by Satterlee and Johnson (1988) [26]. Since then, an independent random mating condition has been used for their maintenance [30][31][32]. These research lines were shipped to University of Arkansas at generation 44 from Louisiana State University and maintained at Arkansas Agricultural Experimentation Station, Fayetteville, AR [27]. We used adult male HS and LS birds for this study because of their stable physiology. We collected blood samples (3ml) from 20 birds each from HS and LS lines. Genomic DNA was purified from each sample using QiaAmp DNA mini kit (Qiagen, Hilden, Germany) following manufacturer's method. DNA quality was assessed using NanoDrop 1000 (Thermo Scientific, Waltham, MA) and agarose gel electrophoresis. Twelve samples showing highest quality per line were pooled to represent each line. Library preparation and Illumina sequencing for the pooled DNA samples were performed by the Research Technology Support Facility at Michigan State University (East Lansing, MI) using Illumina HiSeq 2×150 bp paired end read technology.

Data quality assessment and sequence assembly
We used the FastQC tool (v0.11.6) to assess the quality of raw reads obtained after sequencing in form of FASTQ files (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). After quality assessment, the low quality reads were trimmed out using Trimmomatic tool (v0.32) [33]. The clean reads were then mapped onto the Japanese quail reference genome obtained from NCBI (https://www.ncbi.nlm.nih.gov/genome/113) using Bowtie2 (v2.3.3.1) with the default settings for the parameters [34]. We removed PCR duplicates using rmdup command line of SAMtools (v0.1.19) and SAMtools was further used to convert SAM to BAM files and then to sorted BAM files to save run time in subsequent analysis [35].

CNV detection and copy number estimation
We used CNVnator software (v0.3.3) to predict CNV in sorted BAM files relative to reference quail genome [28]. Optimal bin sizes of 1200 and 1500 were chosen for HS and LS respectively according to author's recommendations, in which the ratio of average read-depth signal to its standard deviation was between 4 and 5 [1,28,36]. All the CNV calls in both HS and LS samples were greater than 1 kb. CNV calls were filtered according to criteria recommended by Abyzov et al. [28] CNV showing P-value <0.01 (e-val1 calculated using t-test statistics), size >1 kb, and q0 < 0.5 (q0: fraction of mapped reads with zero quality) were filtered and used for downstream analysis.
We estimated gene copy number (CN) in HS and LS birds across genome length using the "-genotype" option of CNVnator. We wrote a custom bash script and retrieved CNV genes from CNVRs of HS and LS lines using RefSeq genes from NCBI and BEDOPS tool (v2.4.30) [37].

Real time quantitative PCR validation of CN
Real time quantitative PCR (qPCR) was used to validate CNVs detected by CNVnator in 16 birds each from HS and LS lines. A total of 9 genes showing CNV were randomly chosen and primers were designed using Primer3 software and listed in Table 1.
Primers specificities were checked using Primer-BLAST tool of NCBI. A segment of the βactin gene, which is present in two copies per diploid and showed no CNV in either line of quail, was chosen as control in all reactions. Five nanogram of genomic DNA was subjected to qPCR (total volume of 25 μL) in triplicate reactions using ABI prism 7500HT system (Thermo-Fisher Scientific) with PowerUp SYBER Green Master Mix (ThermoFisher Scientific). The conditions of real-time qPCR amplification were as follows: 1 cycle at 95˚C (10 min), 40 cycles at 95˚C (15 s each), followed by 60˚C for 1 min. We used ΔΔCt method for calculating relative copy number of each gene. First, the cycle threshold (Ct) value of each gene was normalized against the control gene, and then ΔCt value was determined between test gene and reference gene predicted as normal copy number by CNVnator. Finally values around 3 or above were considered as duplications or gain and around 1 or less as deletion or loss.

Functional annotation of CNV genes and network identification
We analyzed genes retrieved from CNVRs in terms of gene ontology and molecular networks using Ingenuity Pathway Analysis (IPA; http://www.ingenuity.com). We imported lists of unique genes identified in CNVRs of HS and LS lines of quail into IPA separately and subsequently mapped to their corresponding annotations in the Ingenuity Pathway Knowledge Base. IPA identifies networks accommodating these unique genes in comparison with comprehensive global network. IPA illustrates each molecular network with an assigned relevance score, the number of focus molecules, and top functions of the network. During analysis, we set each network to the limit of 35 molecules by default and only human was chosen for the species option. We used experimentally observed evidence for the confidence level. Finally the identified networks were presented as network graphs that show biological relationship among molecules. Molecules in network graphs are represented by nodes, distinguished by their shapes based on their functional category, and are connected by distinct edges based on interaction among molecules.

Genome re-sequencing and distribution of CNVs
We performed whole genome resequencing of pooled DNA samples from 12 birds each from HS and LS lines of the quail and produced~250 and~257 million reads of 150 bp respectively. Of those,~85 and~84 million reads were mapped to the reference genome (NCBI/Coturnix japonicia) and their respective depth of coverage reached to~41x and~42x for HS and LS ( Table 2). We used CNVnator tool to call CNVs from the mapped data and considered calls (deletions or duplications) �1 kb length in our analysis which makes more reliable to detect CNVs when used CNVnator [28]. We chose CNVnator tool because it works based on read-depth approach with a concept that the depth of coverage of a genome is positively correlated with copy number of that region [38]. Furthermore, CNVnator can detect large CNVRs with maximum sensitivity even at low coverage, and the reliability of a CNV call actually increases with the size of event [28,38,39].
A total of 262 and 168 CNVRs were identified in HS and LS lines, respectively. Among these, 235 were deleted and 27 duplicated CNVRs in HS, and 148 deleted and 20 duplicated CNVRs in LS lines ( Table 3).
The distribution of deletions and duplications over different chromosomes of the quail genome is shown in Fig 1. Interestingly, there were no CNVRs in chromosome 6 and 16 of LS line but were present in HS line. The number of CNVRs in each chromosome was proportional to its length. Replication and recombination based mechanisms have been suggested as possible events for the CNVs formation across genome of an organism. The recombination rate is higher in longer DNA; therefore it can be the reason for more CNVRs present in large chromosomes in our study [8,40]. The chromosome 16 in chickens has the major histocompatibility complex (MHC) genes that encode key proteins regulating aspects of immune response [41]. A study by Huff et al. reported HS birds more susceptible to Salmonella species as compared to LS birds [27]. Therefore, the deletion event detected in chromosome of 16 of HS birds might be the cause for more susceptibility of HS birds to diseases.
We found fewer copy numbers with zero state deletions i.e. the genes are completely deleted, compared to one state deletion in both HS and LS lines of quail (S1 Table), which was similar to that observed in chickens [42]. The deletions outnumber duplications by a ratio of 8.70:1 in HS and 7.4:1 in LS (Table 3), which is consistent with previous studies where more deletion events were discovered as compared to duplications [43]. The length of CNVRs ranged from 6.0-1341.6 kb in HS and 7.5-1101 kb in LS lines (Fig 2). The total length of deleted CNVRs accounted for 13.8 Mb in HS and 17.02 Mb in LS lines. Similarly, the total lengths of duplicated CNVRs were 1.32 Mb in HS and 1.15 Mb in LS lines. The average lengths of CNVRs were~50 kb in HS and~100 kb in LS lines ( Table 3). The CNVRs covered 1.6 and 1.9% of quail genome in HS and LS, respectively. We found the amount of quail genome affected by CNVs similar to that reported for chickens (1.42%, 2.61%) [40,44], dogs (1.08%) [45], and Holstein cattle (1.61%) [46] but lower than in swine (4.23%) [47], mice (6.87% or 8.15%) [42] and human (5.9%, 12%) [2,48]. However, these values could be affected by sample size, diversity of samples, sequencing technology and CNV calling methods used in the studies [42].

CNV validation using qPCR
In this study, we used pooled DNA samples from each line for whole genome re-sequencing (Explained in methods above). However, we validated individual genes associated with CNV  in 16 birds each from HS and LS lines using qPCR. We randomly selected 9 different genes each from 9 different CNVRs predicted by CNVnator for their validation. We used ΔΔCt method for determining relative CN of the genes. We found~80% of our qPCR results agreed with the CN state predicted by CNVnator ( Table 4).
The result clearly showed that there is difference in CNV in 9 genes between HS and LS lines of quail. Thus, difference in CNV observed in genes in between HS and LS lines can be the reason for their phenotypic variations.

Gene content of CNVRs and bioinformatics analysis
We wrote a custom bash script, used BEDOPS tool and reference genome annotation file (in GFF format) of Japanese quail from NCBI to extract genes from CNVRs of both the lines. We   (Table 5). Similarly, we found 18 uniquely duplicated genes in HS and 22 in LS lines ( Table 5). Lists of uniquely deleted and duplicated genes in HS and LS lines are shown in S2 Table. Structural genetic variations have been known to accumulate during inbreeding process in animals [49]. However, effect of the inbreeding process in accumulation of genetic variation in quail populations was not known to date. We have identified several hundred genes that were fully deleted in HS and LS lines of quail, which supports a phenomenon of perpetual gene turnover in the two quail populations and their genetic differences. Duplication of whole genes has been known to impact gene expression by altering gene dosage [50,51]. If a duplication of a gene is adaptive, it is usually favored and retained more frequently in a population [1]. We found 23 genes in HS and 32 in LS lines that have, on average, 10 or more copies and are considered as high copy number genes. These gene lists included both annotated and unannotated genes with 11 genes having compatible copy number between HS and LS lines (S3 Table). The high copy number annotated genes in HS included PCR11 cleavage and polyadenhylation factor (PCF11), ankyrin repeat domain 42 (ANKRD42), obscurin, cytoskeletal calmodulin and titininteracting RhoGEF (OBSCN), chromosome 2 H6orf52 homolog (C2H6orf52), nucleoporin 153 (NUP153), core-binding factor alpha subunit 2 (CBFA2T2), syntrophin alpha 1 (SNTA1), and dynein axonemal intermediate chain 1 (DNAI1) and in LS lines were PCF11, ANKRD42, and hydroxysteroid dehydrogenase like 2 (HSDL2). The high copy number genes were  Table 6. Uniquely deleted genes in HS and LS lines of Japanese quail associated with canonical pathways.
We used IPA to characterize the biological functions, describe molecular interaction networks and canonical pathways implicated by uniquely deleted genes in HS and LS lines. We identified five canonical pathways significantly (p-value < 0.01) enriched by deleted genes in HS lines (  (Table 6). We found the top diseases and bio functions of the deleted genes in HS line related to endocrine system disorders, organismal injury and abnormalities, neurological disease, and gastrointestinal disease. Similarly, the top diseases and bio functions of deleted genes in LS line were related to endocrine system disorder, organismal injury and abnormalities, connective tissue disorders, and reproductive system disease ( Table 7). The uniquely deleted genes in HS involved in endocrine system disorder are listed in Table 8.
Also, LS line has a network associated with lipid metabolism in deletion. Thus, in contrast to LS line, canonical signaling pathways in HS are related to regulation of immune response, stress and neurological diseases. Therefore, a higher level of mean corticosterone level seen in HS compared to LS lines may be associated with the genes with CNVs. These differences might implicate CNV as an adaptive change in response to restraint stress between HS and LS lines of Japanese quail. This type of adaptive variation at DNA level can improve the fitness of organisms to new and challenging environments [52]. Top Diseases and Functions   1  14-3-3, APH1A, ATP6V0D1, ATP6V1A, ATP6V1G1, atypical protein kinase  C, BSN, CAMK2N2, CaMKII, ERK1/2, Glycogen synthase, GPATCH8,  Growth hormone, IBA57,IL1RAPL1,INSRR,LLGL1, LSG1, MIOX, MLXIPL  We identified a total of 17 gene networks in HS and 18 in LS lines with score not less than 10 (S4 Table) among which 4 different networks in HS and 5 in LS lines were significant interaction networks involved in nervous and endocrine systems development (Table 9).

SN Molecules in Network Score Focus Molecules
Here a score of 10 implies that there will be less than a 10 −10 probability that the genes in the network are associated with each other by chance. The topmost network involving deleted genes in HS line was specifically associated with cell to cell signaling and interaction, cellular assembly and organization, nervous system development and function (Fig 3). The genes associated with loss in this network are involved in signaling pathways of ERK1/2 connected to CaMKII (Ca 2+ /calmodulin-dependent protein kinase II), PPP1R9B (Protein phosphatase 1 regulatory subunit 9B), APH1A (Aph-1 homolog A), proinsulin and growth hormone. CaM-KII, PPP1R96, and APH1A are involved in nervous system development and functions. CaM-KII functions in various cells by phosphorylating proteins involved in synaptic plasticity, electrical excitability and neurotransmitter synthesis [53]. PPP1R9B gene is expressed in dendritic spines and plays a role in receiving signals form the central nervous system [54]. APH1A gene encodes a component of gamma secretase complex that is involved in proteolysis of amyloid precursor protein [55]. The connection in this network therefore suggests loss of CaMKII, PPP1R9B, APH1A and other genes in this pathway may impair functional interactions of ERK1/2 signaling pathway with growth hormones, proinsulin and secretase gamma. This may be a reason for reduced growth rate and low basal weight observed in HS compared to LS birds.
Reduced heterophil/lymphocyte ratio is observed in LS compared to HS line of Japanese quail [27]. Interestingly, we found humoral immune response in HS and cell-mediated immune response in LS lines associated with deleted genes (S4 Table). The gene network associated with cell-mediated response in LS lines is shown in Fig 4. In this network, the deleted genes are associated with signaling pathway of P38 MAPK connected to CDKN1A (cyclin dependent kinase inhibitor 1A), PRKCE (protein kinase C epsilon) and CSF3 (colony stimulating factor 3). The protein encoded by CDKN1A inhibits cyclin-cyclin-dependent kinase2 and function in regulation of cell cycle progression at the G1 phase [56]. PRKCE is involved in lipopolysaccharide (LPS)-mediated signaling in activating macrophages and also functions in controlling anxiety-like behavior [57]. The protein product of CSF3 is cytokine that controls production, differentiation and functions of granulocytes [58]. Therefore, molecular interactions of P38 MAPK with T-cell receptor (TCR), B-Cell Receptor (BCR) complex, and interferon gamma may be impaired due to deletion of CDKN1A, PRKCE, CSF3 and other CNV related genes. It might indicate for suppression of cellular response leading to reduced heterophil counts in LS birds of quail.
We identified different sets of genes affected by CNVs in HS and LS lines of quail, most importantly involved in nervous and endocrine systems development, humoral and cell-mediated immune response and different metabolisms. This result supports our hypothesis that CNVs have impact in increasing genotypic diversity and thereby phenotypic traits observed in quail. In the future we will perform a functional validation study such as expression of candidate CNV genes at protein level using different tissues from the two quail lines. The quail will continue to evolve as an important research animal model for understanding well-being and production performances in avian species and other animals.
Supporting information S1