The Genome of Polymorphonuclear Neutrophils Maintains Normal Coding Sequences

Genetic studies often use genomic DNA from whole blood cells, of which the majority are the polymorphonuclear myeloid cells. Those cells undergo dramatic change of nuclear morphology following cellular differentiation. It remains elusive if the nuclear morphological change accompanies sequence alternations from the intact genome. If such event exists, it will cause a serious problem in using such type of genomic DNA for genetic study as the sequences will not represent the intact genome in the host individuals. Using exome sequencing, we compared the coding regions between neutrophil, which is the major type of polymorphonuclear cells, and CD4+ T cell, which has an intact genome, from the same individual. The results show that exon sequences between the two cell types are essentially the same. The minor differences represented by the missed exons and base changes between the two cell types were validated to be mainly caused by experimental errors. Our study concludes that genomic DNA from whole blood cells can be safely used for genetic studies.


Introduction
Genomic DNA from peripheral blood cells is routinely used for genetic studies. For example, it is a common practice to use blood DNA to distinguish germline variation and somatic mutation in solid tumors [1][2][3][4][5][6][7]. Blood cells consist of multiple cell types of myeloid and lymphoid lineages [8]. Myeloid cells are differentiated rapidly from myeloid progenitors to myeloblasts and to mature terminal cells of neutrophils, eosinophils, basophils and monocytes, towards the end stage of cellular destruction by apoptosis, necrosis or netrosis [9][10]. During differentiation, the nuclei of myeloid cells transform from mononuclear to segmented and banded polymorphonuclear shape of 2-5 lobes. Little is known if the nuclear morphological transformation during myeloid differentiation accompanies any genome sequence change. If such change does exist, the sequences derived from blood cells containing myeloid cells will reflect the genomes of mixed mononuclear and polymorphonuclear cells. Interpretation of such heterogeneous sequences will be problematic. While studies on selected genes were performed [11], no systematic attempts have been reported to determine, at genome level, the nature of genetic sequences in polymorphonuclear myeloid cells. In this study, we used the exome sequencing method [12] to analyze the entire coding regions of neutrophils, the most abundant myeloid cells constituting 40-60% of nucleated cell counts in peripheral blood [9], and compared the data with the mononuclear CD4+ T cells represeenting the intact genome of the same individual. Our study shows that the coding regions in the polymorphonuclear neutrophils are essentially the same as the intact genome.

Results and Discussion
Using exome sequencing method, we analyzed the coding regions between the genomes of polymorphonuclear neutrophils and mononuclear CD4+ T cells from the same healthy individual. our study detected 197,988 (98.5%) and 197,565 (98.3%) of the targeted 201,046 human genome exons in neutrophils and in CD4+ T cells respectively, of which 196,749 exons are the same between the two cell types (Table 1). And there are 3,058 and 3,481 (1.5% and 1.7%) exons missed in neutrophils and CD4+ T cells respectively, of which 2,242 are missed in both cell types and the rest are missed in a single cell type (Table 1). Furthermore, there are 150,719 SNVs detected in neutrophils and 150,203 SNVs in CD4+ T cells, of which 141,034 are common between the two cell types, and 9,685 (6.4%) and 9,169 (6.1%) are only present in neutrophils and CD4+ T cells, respectively ( Table 2).
We used PCR to test if the missed exons reflect the true exon differences or were caused by experimental artifacts. Based on statistical analysis, we randomly picked 100 missed exons for the validtion, which provide 88% of probability to test a missed exon. Three types of results were generated: 1) 89 reactions detected the targeted exons with the same size in both T cells and neutrophils, implying that those missed exons are present in both T cells and neutrophils ( Figure 1); 2) 10 reactions failed to detect the missed exons in both T cells and neutrophils, implying that those exons may not be present or may not be included in the exome kittargeted exons in this donor's genome; and 3) 1 reaction (#73 in Figure 1) detected the missed exons with different size in T cells and neutrophils. This exon is for AIRE, a gene involved in regulating auto-antigen expression and auto-reactive T-cell negative selection. The results indicate that most of the exons missed from exome data were caused by experimental failure, likely missed during exome DNA capturing process, an event often present in exome sequencing study [13]. We then used Sanger sequencing to validate if the observed single-base differences between T cells and neutrophils reflect the true variants bewtween the two cell types, or if the differences were generated by sequencing errors or miscalling. Based on statistical analysis, we selected 40 candidates for validation, which provides 91.6% probability to confirm a variant. Of the 39 successful reactions, 18 are determined as sequencing errors, 8 are confirmed as true homozygous variants in the individual genome, and 13 are  validated as heterozygous variants but misinterpreted by mapping program (Table 3). Therefore, the variants mapped differently between T cells and neutrophils are mostly caused by sequencing errors or miscalling in the mapping process. TCR loci in T cells can be highly polymorphistic due to VDJ recombination. We compared the sequences from neutrophils and T cells mapped to the TCR-related loci (TCR-alpha and TCR-delta, chr14:22,205,021-23,021,097; TCR-beta, chr7:142,000,946-142,945,186; TCR-gamma,  chr7:38,288,844-38,403,119; and PTCRA, chr6:42,883,727-42,893,575), but we did not find any coding differences for these loci between the two cell types. We also compared the exome data between neutrophils and CD19+ B cells of the same individual, and also not observed any diffferences (data not shown). Myeloid cell lineage undergoes rapid differentiation and dramatic nuclear morphological change. While it remains unknown if any sequence changes in the non-coding regions could occur during myeloid differentiation and certain very rare mutations can exist in the coding regions [14], our study shows that the coding genes in polymorphonuclear neutrohophils remain essentially the same as the intact genome. Our study suggests that the chromosomes in myeloid cells remain linear chromatin structure regardless the morphological changes during myeloid differentiation. Our study concludes that genomic DNA from myeloid lineage cells can be safely used in genetic studies.

Methods
Ethics Statement: The cells used for the study were obtained from AllCells LLC (http://www.allcells.com/, Emeryville, California), which has its full IRB system (Biomed IRB) for providing human blood cells from donors for research. The donor signed the written consent form, which is archived with their medical records. According to US Federal Regulations, 45 CFR Part 46.101(b)(4)-Protection of Human Subjects, using this type of human cells for research is exempted from the requirement for IRB review. Peripheral leukapheresis blood was collected from a healthy Caucasian male donor, with cell count of red blood cells of 4.76610 3 /mm 3 , and leukocyte differentiation of lymphocytes 1.76103/mm 3 (23.0%), monocytes 0.3610 3 /mm 3 (4.7%), and granulocytes of 5.8610 3 /mm 3 (72.3%). The collected blood sample was used immediately for cell purification: red blood cells were depleted by sedimentation using the HetaSep solution (Stem Cell Technologies); neutrophils were isolated by using the EasySep human Neutrophil Enrichment Kit (Stem Cell Technologies); mononuclear cells were isolated by using Ficoll solution (GE Healthcare) and CD4+ helper T cells were isolated from the mononuclear cells using the StemSep human Naïve CD4+ T Cell Enrichment Kit (Stem Cell Technologies). The purity of the isolated cells was determined by FACS analysis with 90% for neutrophils and 97% for CD4+ T cells. DNA was extracted from the purified cells by using the FlexiGene DNA kit (QiaGen).
Exome DNA was capyured from DNA sample of neutrophils and CD4+ T cells using Illumina TruSeq exome enrichment kit following manufacturer's protocols (http://www.illumina.com/ products/truseq_exome_enrichment_kit.ilmn). Paired-end exome sequencing (26100) was performed at 200x exome coverage per sample using an Illumina Hiseq 2000 sequencer. Sequences from each cell type were compared to the 201,046 exons covered by the Illumina TruSeq exome enrichment kit (Illumina TruSeq Exome Targeted Region database 1.3.0). Exome sequences were mapped to the human genome reference sequences (hg19) using BWA-SW [15] and SAMtools [16] with default parameters. Variations were called using VarScan 2 [17] on the conditions of minimum coverage .10, minimum variation frequency .0.2, minimum average quality score .30, and p-value ,0.05. The called variations were searched in dbSNP135 for SNP identification. PCR and Sanger sequencing were used to validate the missed exons and single-based variants identified by exome mapping. We performed a statistical analysis to determine the proper number of candidates for the validations. Assuming the missed exons and variants occur at random with probability , the number of the missed exons and variants in sequenced candidates will follow a binomial distribution . Then the probability of at least missed exons and variants is where represents binamial random variable, represents minimal number of missed exons or variants, represents binomial distribution, represents total sample size. The rate of missed exons is 0.021. Testing 100 candidates will provide 0.880 chance to detect each missed exon; the rate of single base variants is 0.06. Testing 40 candidates will provide 0.916 chance to detect a variant [18]. PCR primers were designed by Primer3 (http://frodo.wi.mit. edu/primer3/). PCR was performed with DNA (20 ng/reaction), sense and antisense primers (10 pmol), and GoTaqH DNA polymerase (1.25 unit, Promega) at the conditions of denaturing at 95uC 7 minutes, 37 cycles of 95uC 30 seconds, 57uC 30 seconds, 72uC 45 seconds, final extension at 72uC 7 minutes. PCR products were checked on 2% agarose gels. For these to be sequenced, each was purified using an Illustra GFX 96PCR Purification kit (GE Healthcare), and sequenced using BigDye Terminator v3.1 in an ABI3730 DNA sequencer (Applied BioSystems).
The exome data from neutrophils and CD4+ T cells have been deposited in NCBI, pending for assigned accession. The exome data from neutrophils and CD4+ T cells have been deposited in NCBI, with accession number SRR933550 for Neutrophils and SRR933549 for CD4+ T cells.