Heteroplasmy in the complete chicken mitochondrial genome

Chicken mitochondrial DNA is a circular molecule comprising ~16.8 kb. In this study, we used next-generation sequencing to investigate mitochondrial heteroplasmy in the whole chicken mitochondrial genome. Based on heteroplasmic detection thresholds at the 0.5% level, 178 cases of heteroplasmy were identified in the chicken mitochondrial genome, where 83% were due to nucleotide transitions. D-loop regionwas hot spot region for mtDNA heteroplasmy in the chicken since 130 cases of heteroplasmy were located in these regions. Heteroplasmy varied among intraindividual tissues with allele-specific, position-specific, and tissue-specific features. Skeletal muscle had the highest abundance of heteroplasmy. Cases of heteroplasmy at mt.G8682A and mt.G16121A were validated by PCR-restriction fragment length polymorphism analysis, which showed that both had low ratios of heteroplasmy occurrence in five natural breeds. Polymorphic sites were easy to distinguish. Based on NGS data for crureus tissues, mitochondrial mutation/heteroplasmy exhibited clear maternal inheritance features at the whole mitochondrial genomic level. Further investigations of the heterogeneity of the mt.A5694T and mt.T5718G transitions between generations using pyrosequencing based on pedigree information indicated that the degree of heteroplasmy and the occurrence ratio of heteroplasmy decreased greatly from the F0 to F1 generations in the mt.A5694T and mt.T5718G site. Thus, the intergenerational transmission of heteroplasmy in chicken mtDNA exhibited a rapid shift toward homoplasmy within a single generation. Our findings indicate that heteroplasmy is a widespread phenomenon in chicken mitochondrial genome, in which most sites exhibit low heteroplasmy and the allele frequency at heteroplasmic sites changes significantly during transmission events. It suggests that heteroplasmy may be under negative selection to some degree in the chicken.


Introduction
The chicken mitochondrial genome is a circular DNA molecule comprising about 16.8 kb, which encodes 13 proteins, two rRNAs, and 22 tRNAs in the same manner as other types of vertebrate mitochondrial DNA (mtDNA) [1]. There are hundreds to thousands of mtDNA copies per cell and mtDNA has a very high mutation rate. Heteroplasmy refers to the presence of more than one mtDNA variant within a cell, tissue, or individual, and it is not as rare as previously considered. Many human mutations exist in a heteroplasmic state (https://www. PLOS  Rhode Island Red♂×Silky♀ (RS), and Silky♂×Gushi Chicken♀ (SG), and pedigree hatching was conducted (the populations were designated as heteroplasmic populations). Samples of anticoagulant-treated blood were taken from the parent population (F0 generation) after collecting the fertilized eggs from related individuals. The young chicks (F1 generation) were tagged and raised in cages under conventional conditions, where food and water were provided ad libitum. At 1, 30, and 60 days of age, F1 generation chicks were randomly selected from different groups and sacrificed. Anticoagulant-treated blood samples and multiple tissues were collected for analysis. In addition, at least 15 tissues were collected from one RR chicken and mixed to prepare homogenates at 1 and 60 days of age. The collected tissue samples were snap frozen in nitrogen and stored at below -80˚C until NGS sequencing. The blood samples obtained from F0 and F1 individuals were used for detecting heteroplasmic transmission between the parents and offspring. Tissue/blood DNA samples were extracted according to the phenol-chloroform extraction method. All procedures were approved by the Animal Care and Use Committee of Henan Agricultural University (Zhengzhou, China).

Long-range PCR (LG-PCR)
LG-PCR was conducted as described by Zhang et al. [18]. Briefly, a set of back-to-back primers (designated as PM primers; Table 1) were designed based on the NC_001323.1 sequence for amplifying the whole mitochondrial genome. The product comprised approximately 16.8 kb. We performed PCR amplification with about 15-50 ng tissue/blood DNA as the template in a 50-μL PCR system, using Lamp TM DNA Polymerase with Mg 2+ plus buffer (Vazyme Biotech Co. Ltd) under the following conditions: initial incubation at 94˚C for 2 min, followed by 30 PCR cycles with denaturation at 94˚C for 30s, and annealing and extension at 68˚C for 10 min, before one final extension cycle at 72˚C for 7 min and holding at 4˚C. Next, 3 μL of the PCR products were subjected to electrophoresis on 1.5% agarose gels. The LG-PCR products were purified by DNA gel extraction kit (Generay, Shanghai, China) and then conducted Illumina HiSeq Sequencing.

DNA template library preparation for Illumina indexed sequencing
Amplicon sequencing by Illumina HiSeq. Paired-end index libraries were constructed according to the manufacturer's instructions (NEBNext1 Ultra™ DNA Library Prep Kit for Illumina1) with minor modifications. Briefly, the LG-PCR products were randomly fragmented into sizes <400 bp by sonication (Diagenode Bioruptor UCD-200). The fragments were treated with End Prep Enzyme Mix. Size selection was then performed for the adaptorligated DNA using AxyPrep Mag PCR Clean-up (Axygen), and fragments of~400 bp (with approximate insert sizes of 250 bp) were recovered. Each sample was amplified by PCR for eight cycles using P5 and P7 primers. The PCR products were cleaned using AxyPrep Mag Quality control for reads. Dirty reads were removed using NGSQCToolkit (v2.3) software and the following quality control processes were conducted: (1) removal of primers and adaptors; (2) removal of 3' end bases with quality values below 30 and ambiguous bases; (3) retaining over 75% of the reads with quality values above 30; and (4) removing reads with N over 10%. The clean reads were used for the subsequent analyses.
In total, 30 of 48 samples lacked sufficient clean reads and they were omitted from subsequent analyses. The average sequencing coverage for the remaining individuals was *7,800×(range:26.5-6,5000×). The data was deposited in the Sequence Read Archive (SRA) at the National Center for Biotechnology Information (NCBI) under accession numbers (SRR10225379-SRR10225396). We focused on cases of heteroplasmy involving single base substitutions. The deep coverage allowed the detection of very low mutation heteroplasmy at any of the 16,775 nucleotide positions. It has been shown that the analytic sensitivity of heteroplasmy detection is correlated with the coverage depth. We routinely obtained a coverage depth of about 10,000-20,000 fold for the mitochondrial genome.
PCR-RFLP. The two heteroplasmic sites comprising mt.G8682A (D118H) and mt. G16121A (referenced to NC_001323.1) exhibited different heteroplasmic levels according to NGS sequencing. Both the changes of mt. 8682G!A and mt. 16121A!G could lead to the loss of the Hinf1 enzyme site. PCR-RFLP was used to identify the heteroplasmy of mt.G8682A and mt.G16121A in the NGS samples and the blood DNA samples of five breeds (LH, PR, SK, BY, and TB). Briefly, the PCR products amplified with primer sets P8682 and P16121 (Table 1) were digested with the Hinf1 enzyme according to the manufacturer's instructions (Fermentas, MBI) and the cleavage products were separated by 3% agarose electrophoresis. The heteroplasmic ratio was expressed as the certain allelic ratio based on the grey value detected using an Automatic Gel Imaging and Analysis System (ChampGel 5000, Beijing Sage Creation Science Co., Ltd, Beijing, China). For example, the ratio of allele mt.8682A for mt.G8682A was the grey value of: base A/grey value of base (A + G) × 100%.

Pyrosequencing
Pyrosequencing was conducted with blood DNA samples to detect the heteroplasmy of mt. A5694T and mt.T5718G in the F0 and F1 generations of the constructed heteroplasmic population. Briefly, fragments containing mt.A5694T and mt.T5718G mutations (referenced to NC_001323.1 in the ND2 gene) were amplified with PND2 primers ( Table 1). The PCR products were purified and subjected to clone sequencing. mt.A5694T and mt.T5718G were in strong linkage disequilibrium. The clones with the 5694A-5718T and 5694T-5718G haplotypes were selected as the positive controls for pyrosequencing. A pyrosequencing primer (5 0 -CC ATTCAGCCTCCGA-3 0 ) was used as the sequencing primer and all of the steps were performed according to the manufacturer's protocol. The relative ratios of the two alleles in the mt. A5694T and mt.T5718G sites were scored. All samples were analyzed in triplicate.

Heteroplasmy information for 18 NGS samples
NGS sequencing with the LG-PCR products amplified using the set of back-to-back primers (S1 Fig) allowed us to examine each nucleotide in the entire 16.8 kb mitochondrial genome, thereby providing uniform coverage and sufficient depth for quantifying heteroplasmy. Eighteen samples yielded sufficient data and 30 samples were discarded because of their low quality data. Based on an average coverage of~7,800× (S2 Fig), we used stringent criteria (Materials and Methods) to identify 178 cases of heteroplasmy where MAF �0.5% (Fig 1). The cases of heteroplasmy were distributed across 168 positions in the chicken mtDNA genome, with MAF varying from 1 to 40.3% (Table 2, S1 Table).
Among the 178 heteroplasmic mutations, most (149) were nucleotide transitions and only 29 mutations were transversions ( Fig 2C, Table 3). All transversions were not polymorphic (with the same predominant alleles) and they were detected at low frequencies among the 18 NGS samples, where the predominant reference alleles had AAF values that varied from undetectable to 4.95% (S1 Table).

Polymorphisms in 18 samples
We detected 56 polymorphic sites (with different predominant alleles) in 18 NGS samples (Table 4, Fig 2B, and S2 Table). Fifteen polymorphic sites were in D-loop regions in the chicken mitochondrial genome. In the coding regions, six polymorphic sites were non-synonymous mutations and 26 polymorphic sites were synonymous mutations ( Table 4, Fig 2B). Compared with the distribution of heteroplasmy, less polymorphisms were detected in the same NGS samples and less polymorphisms were located in D-loop regions (p < 0.001, Chisquare test). We found that 54 polymorphic sites were transitions and two polymorphic sites were transversions (Table 4, Fig 2C). The ratio of transitions relative to transversions among the polymorphic sites (27:1) was higher (p < 0.012, Fisher's exact test) than that in heteroplasmic sites (5.13:1). The maximum heteroplasmic degree (for MAF) for these polymorphic sites varied among 0.10-40.3% (S2 Table). In total, 43 of the 56 polymorphic sites were heteroplasmic (MAF � 0.5%) and all were transition mutations ( Table 4, S2 Table).

Heteroplasmic comparison in intra-individual and among individuals
First, the heteroplasmic fluctuation among tissues were investigated based on the NGS data from 10 tissues of the same individual (RS1, offspring of Rhode Island Red♂×Silky♀, aged 60 days). The 141 sites were still heteroplasmic (MAF � 0.5%) in the same individual ( Table 2, S1  Table). The heteroplasmic sites varied among the intra-individual tissues. The heteroplasmic sites had allele-specific, position-specific, and tissue-specific features. All of the cases of heteroplasmy had the same predominant allele among the intra-individual tissues (allele-specific, S1 Table). Four sites (mt.A683G, mt.C737T, mt.G738A, and mt.G8682A) exhibited heteroplasmy in all of the detected tissues (Fig 1, S1 Table). In addition, mt.C737T (ranged from 9.82% to 40.0%) and mt.G8682A (ranged from 27.5% to 33.0%) exhibited relative high heteroplasmy across all of the detected tissues (position-specific, S1 Table). In total, 80.8% sites (114 sites) were heteroplasmic in only one tissue (tissue-specific, Fig 1 and S1 Table), 12 sites were heteroplasmic in two tissues, 10 sites were heteroplasmic in three to five tissues, and one site (mt. It was ordered as legend from the innermost to outermost ring. Here the CR samples from different individuals were named as CR1-CR7, the corresponding individuals and populations were presented in S1 A16438G) was heteroplasmic in seven tissues (S1 Table). Each tissue had 10-53 heteroplasmic sites ( Table 2). Cases of heteroplasmy were most abundant in skeletal muscles, including crureus (48 sites) and pectoral (53 sites) muscles ( Table 2, S1 Table). By contrast, cerebrum, testis, visceral fat, and lung tissues had relatively less heteroplasmic sites. In addition, crureus and cerebrum tissues had less undetectable sites (four sites), and pectoral muscle had the most undetectable sites (66 sites) among intra-individual tissues ( Table 2, S1 Table).
Then, the crureus heteroplasmic features were further analyzed based on the NGS data from seven chickens. Among the crureus tissues of seven individuals, 93 sites (MAF�0.5%) were heteroplasmic ( Table 2, S1 Table). The heteroplasmy in crureus tissues varied among individuals, and 56 sites were polymorphic where the maximum MAF varied among 0.1-40.3% (S1 Table). The mitochondrial mutations/heteroplasmy appeared to exhibit clear maternal inheritance features based on the whole chicken mitochondrial genome. The offspring from Rhode Island Red mothers (RR1, RR2, RR3, SR1, and SR2) had the same predominant alleles at all of the heteroplasmic sites. The polymorphic sites were easy to distinguish. We used the polymorphic sites to further infer the inherited features of mutations/heteroplasmy. The SR offspring (SR1 and SR2) had the same predominant allele as the RR offspring (RR1 -RR3) at all 56 polymorphic sites (the mothers of both SR and RR offspring were Rhode Island Red chickens), but their predominant allele were not the same as the RS offspring (the mother of RS1 is Silky) at 42 of 56 polymorphic sites (S1 Table). In addition, the fathers of both the SR offspring (SR1 and SR2) and SG offspring (SG1) were SK, whereas their mothers were Rhode Island Red and Gushi chickens, respectively, and their predominant alleles differed at 44 of 56 polymorphic sites (Table 2).

Heterogeneity transmission between generations
Both mt.A5694T and mt.T5718G in mtND2 gene were reported as heteroplasmic in a Gushi chicken resource population [14]. According to NGS data, they were identified as potentially  heteroplasmic sites (0.1%�MAF �0.5%), and the maximum heteroplasmy values were 0.26% for mt.A5694T and 0.33% for mt.T5718G in the 18 NGS samples (S5 Table). We investigated the heterogeneity of mt.A5694T and mt.T5718G in the blood DNA of constructed heteroplasmic population (F0 and F1 generations) by pyrosequencing ( Table 6, S6 Table). Both mt. A5694T and mt.T5718G were polymorphic in the constructed heteroplasmic population. The mt.5694A allele (for mt.A5694T) and the mt.5718T allele (for mt.T5718G) were the predominant alleles among individuals and within most individuals. It is interesting that the degree and the occurrence rate of heteroplasmy were significantly higher in F0 chickens (45 weeks old) than F1 population for both the mt.A5694T and mt.T5718G sites (Fisher's exact test, p < 0.001). In the F0 population, 15 of 26 individuals were heteroplasmic in Rhode Island Red chickens, and nine to 11 of 24 individuals were heteroplasmic in SK chicken. However, in the F1 population, for the mt.T5718G site, no cases of heteroplasmy were detected in both RR and RS, and only one to two cases of heteroplasmy were detected in the SS, SR, and SG populations. For the mt.A5694T site, only one to two cases of heteroplasmy were detected in the RR, RS, SS, SR, and SG populations. These results suggest that the occurrence of heteroplasmy decrease greatly over the generations (Table 6, S6 Table). We further investigated the heteroplasmic transmission of both the mt.T5718G and mt. A5694T sites according to the pedigree information (S7 Table). We found that the occurrence rate and degree of heteroplasmy for both the mt.T5718G and mt.A5694T sites decreased dramatically from the F0 to F1 generations (Fisher's exact test, p < 0.0001). For the mt.T5718G site (S7 Table), 17 of 26 mothers were heteroplasmic (mt.5718T allele ratio varied among 3.75-97.1%), whereas 52/53 of the offspring from the heteroplasmic mothers were homoplasmic, with 5718T as the predominant allele. For the mt.A5694T site (S7 Table), 17 of 26 mothers were heteroplasmic (the mt.5694A allele ratio varied among 4.5-98.7%), whereas 48/53 of their offspring were homoplasmic, with mt.A5694T as the predominant allele. In addition, one of the offspring (079-2) was homoplasmic with a 5718T and 5694A allele ratio of 100%, whereas its parents had low mt.5718T and mt.5694A allele ratios of 3.55-4.5%.

Discussion
Mitochondrial DNA has been detected in the nuclear genomes of eukaryotes as pseudogenes, or nuclear mitochondrial DNA segments (Numts). Pereira et al. detected at least 13 Numts in the chicken nuclear genome, where the similarity between the Numts and mitochondrial sequences varied from 58.6% to 88.8% [21]. Numts are potential source of contamination in mtDNA research. Thus, it is difficult to filter the nuclear gene sequences that are nearly identical to mtDNA during data analysis [18,22]. Instead of short-range PCR, single back-to-back LG-PCR can be used for NGS sequencing to reduce the interference from nuclear copies of the mitochondrial genome [22][23][24].
Our results showed that NGS sequencing was an effective and highly sensitive method for detecting heteroplasmy in the whole chicken mitochondrial genome. We found that heteroplasmy was widespread in the chicken mitochondrial genome where most of the sites exhibited low heteroplasmy. The relatively more common occurrence of heteroplasmy (178) than polymorphisms (56) in the same NGS samples suggests that heteroplasmy can drift to high frequencies within an individual, and eventually be fixed as polymorphisms among individuals. The heteroplasmic positions were not distributed randomly throughout the chicken mtDNA genome (Fig 1). D-loop regions were hotspot regions for the occurrence of heteroplasmy, as also shown in humans [7,8,25].
In chickens, heteroplasmy was biased toward transition mutations, as also found in humans, with a higher transition ratio [25][26][27]. The relatively high ratio of transversions among the low-frequency cases of heteroplasmy may indicate negative selection against heteroplasmy, which suggests that some transversions may be detrimental to mtDNA function, and thus they can only be tolerated at low frequencies and cannot reach "fixation" within an individual.
The occurrence of heteroplasmy varied among tissues within individuals, with allele-specific, position-specific, and tissue-specific features. Heteroplasmy was relatively more common in skeletal muscle, as also found in humans [7,25,27]. We found that 81% of the cases of heteroplasmy were present in only one tissue (tissue specific), thereby suggesting that most are probably due to somatic mutations, whereas only a few are likely to be inherited [25].
The mt.G8682A (D118H) is located in the Cox2 region and the mutation was predicted to affect the protein's function (http://sift.bii.a-star.edu.sg/) with a score of 0.02 [28]. Only one individual (RS1) was found to have high heteroplasmy (30.5-33%) in 10 tissues by NGS sequencing and the homoplasmic AA genotype was found in none of the samples tested. In addition, only one heteroplasmic individual was detected in five breeds, which suggests that the mt.G8682A change could be detrimental to the function of mitochondria and that it cannot be fixed. Both mt.A5694T (T152S) and mt.T5718G sites (S160A) are transversion mutations in the mt-ND2 region. The mt.T5718G mutation was predicted to affect the secondary structure of RNA (http://www.genebee.msu.su/services/rna2_reduced.html). The structure in the mt.5694T-mt.5718G haplotype had a higher free energy than that with mt.5694A-mt.5718T (S5 Fig). The reduction in the ratios of the mt.5694T and mt.5718G alleles from the F0 generation to the F1 generation in the constructed heteroplasmic population also indicated negative selection against heteroplasmy.
At present, the inheritance of mitochondrial heteroplasmy remains unclear. The central dogma of maternal inheritance for mtDNA remains valid, but Luo et al. reported that mtDNA segregation in some families indicated biparental mtDNA transmission with an autosomal dominant-like inheritance mode [29]. The NGS data obtained in the present study demonstrate that mitochondrial polymorphisms/heteroplasmy in the chicken exhibit maternal inheritance at the entire mitochondrial genomic level. However, our analysis of the heteroplasmic transmission of the mt.A5694T and mt.T5718G sites based on pedigree information obtained from the constructed heteroplasmic population by pyrosequencing (Table 6, S7 Table) demonstrated that the intergenerational transmission of mtDNA in heteroplasmic chickens exhibited a rapid shift toward homoplasmy within a single generation. Previously, it was reported that the percentage of heteroplasmic individuals with respect to the mt.A5694T (same as mt. A5703T for AP003317) and mt.T5718G sites (same as mt.T5727G for AP003317) decreased by approximately 50% from the F0 generation to the F1 generation in a Gushi resource population [14]. In addition, Naue et al. reported a general age dependence for muscle and brain, with a linear correlation in terms of the accumulation of heteroplasmy in muscle [27], which could explain the high heteroplasmy in F0 individuals.
Mitochondria undergo a bottleneck during oogenesis, so it is expected that the frequency of alleles at heteroplasmic sites will differ even among related individuals [30]. However, in agreement with our results, Wai et al. found that intergenerational mtDNA transmission in heteroplasmic mice exhibited a rapid shift toward homoplasmy within a single generation [31,32]. Guo et al. showed that very low heteroplasmy variants (down to almost 0.1%) in humans are inherited maternally and that this inheritance decreased to about 0.5% [13]. Rebolledo-Jaramillo et al. observed dramatic shifts in the frequency of heteroplasmy between generations and estimated the effective size of the germline mtDNA bottleneck at only *30-35 [33].
In conclusion, NGS data showed that mtDNA heteroplasmy was widespread in the chicken mitochondrial genome, where most cases of heteroplasmy had a low ratio. D-loop regions were identified as hotspots for heteroplasmy. Heteroplasmy was biased toward transition mutations, but the ratio of transversions relative to transitions in heteroplasmic sites was higher than that in polymorphic sites. Intergenerational mtDNA transmission in heteroplasmic chickens exhibited a rapid shift toward homoplasmy within a single generation according to analyses of heteroplasmy at mt.A5694T and mt.T5718G. Our findings suggest that heteroplasmy may be under negative selection to some degree in the chicken.