Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Haplotype-resolved assembly of the mule duck genome using high-fidelity sequencing technology

  • Tiandong Che ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    tiandong_che@163.com (TC); wenguangzhang@imau.edu.cn (WZ)

    Affiliations College of Life Science, Inner Mongolia Agricultural University, Hohhot, China, Annoroad Gene Technology Co., Ltd, Beijing, China

  • Jing Li,

    Roles Resources

    Affiliation Institute of Animal Genetics and Breeding, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, China

  • Xiaobo Li,

    Roles Investigation, Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Zhongsi Wang,

    Roles Investigation, Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Xuemei Zhang,

    Roles Investigation, Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Weifei Yang,

    Roles Formal analysis, Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Tao Liu,

    Roles Formal analysis, Project administration

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Yan Wang,

    Roles Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Kaiqian Wang,

    Roles Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Tian Gao,

    Roles Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Guangqiang Shen,

    Roles Methodology

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Wanling Qiu,

    Roles Supervision

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Zhimin Li,

    Roles Project administration, Writing – review & editing

    Affiliation Annoroad Gene Technology Co., Ltd, Beijing, China

  • Wenguang Zhang

    Roles Conceptualization, Project administration, Supervision, Writing – review & editing

    tiandong_che@163.com (TC); wenguangzhang@imau.edu.cn (WZ)

    Affiliation College of Life Science, Inner Mongolia Agricultural University, Hohhot, China

Abstract

Mule duck is vitally important to the production of global duck meat. Here, we present two high-quality haplotypes of a female mule duck (haplotype 1 (H1):1.28 Gb, haplotype 2 (H2): 1.40 Gb). The continuity (H1: contig N50 = 14.90 Mb, H2: contig N50 = 15.70 Mb) and completeness (BUSCO: H1 = 96.9%, H2 = 97.3%) are substantially better than those of other duck genomes. We detected the structural variations (SVs) in H1 and H2. We observed a positive correlation between autosome length and the number of SVs. Z chromosome was some deficient in deletions and insertions, but W chromosome was some excessive. A total of 1,451 genes were haplotype specific expression (HSEs). Among them, 737 specifically expressed in H1, and 714 specifically expressed in H2. We found that H1 and H2 HSEs tended to be involved in similar biological processes, such as myometrial relaxation and contraction pathways, muscle structure development and phosphorylation. Our haplotype-resolved genome assembly provides a powerful platform for future functional genomics, molecular breeding, and genome editing in mule duck.

Introduction

The global production of duck meat has experienced a significant surge in recent times. The primary global producer of duck meat is Asia, particularly due to the Pekin duck [1]. China accounts for approximately 84% of the total Asian duck production [2]. In Europe, Pekin duck dominates in most countries, especially in northern and central eastern regions. In the Mediterranean countries such as France and Italy, mule and Muscovy ducks are more prevalent [3], contributing to around 11% of the worldwide duck meat supply [2].

Mule ducks, which are sterile hybrids resulting from the crossbreeding of male Muscovy ducks (Cairina moschata) [4] and female Pekin ducks (Anas platyrhynchos), demonstrate a remarkable tolerance towards adverse environmental conditions, temperature fluctuations, and diseases. They exhibit rapid growth rates, efficient feed conversion, and superior meat quality. Male mule ducks are particularly preferred for fatty liver production [5, 6]. Indeed, mule ducks benefit from heterosis effects and inherit specific qualities from each parent species, including increased feed intake capacity [7]. Recent research shows that mule ducks are characterized by lower water and higher fat and zinc content in leg muscle compared to Muscovy ducks [2].

With the continuous development of long-read sequencing technology, such as PacBio and Oxford Nanopore, researchers are able to assemble high-quality genomes for subsequent studies. Among them, the introduction of HiFi technology enables researchers to decode complex regions on the genome, such as telomeres and centromeres [8], more accurately. It has been proven that HiFi technology performs well in terms of continuity and integrity in genome assembly, and is widely used in areas such as telomere-to-telomere (T2T) genomes [9, 10], haplotype-resolved genomes [11, 12], and gap-free genome assemblies [1315].

Here, we provide a haplotype-resolved female mule duck genome, which will be a valuable resource for genetic breeding programs with mule ducks, especially in meat production.

Materials and methods

Animals and sample collection

Three 60-days old healthy female mule ducks were examined in this study. The muscle of leg was rapidly dissected from each carcass and immediately frozen in liquid nitrogen. All samples were stored at −80˚C until total RNA extraction. Animals were humanely killed to ameliorate suffering by intravenous injection with 2% pentobarbital sodium (25 mg/Kg). All experimental procedures and sample collection in this study were approved by the Institutional Animal Care and Use Committee (IACUC) of Sichuan Agricultural University, under permit No. DKY-B20141401.

Genome sequencing and assembly

The mule duck genome was sequenced and assembled using DNA from a female individual at 60 days old (one of the three female mule ducks used for total RNA extraction). To generate PacBio HiFi data, high-molecular-weight genomic DNA from the muscle of mule duck was extracted using FineOut Universal Animal and Plant Genome Extraction reagent (solution type). DNA integrity was assessed using Femto Pulse. Megaruptor was used to interrupt 6.5 μg DNA for genome fragments, after which AMPure PB beads were used for purification. Two SMRT bell libraries were constructed using the Pacific Biosciences SMRT bell express template prep kit 2.0. The constructed libraries were selected for fragment size on the BluePippin™ system, constructed with insertion fragment size of 15 Kb, then subjected to primer annealing and bound the SMRTbell template to the polymerase using a DNA/polymerase binding kit. The library was sequenced on Sequel IIe platforms for 30 hours.

Haplotype-resolved genome assembly contigs were generated using hifiasm software (v0.15.2) [16] on HiFi reads, and the resulting GFA format files were converted to FASTA files using gfatools (see URLs). The high base quality of the HiFi contigs [17] precluded polishing; however, we used RagTag (v2.0.1) [18], with Muscovy duck (C. moschata) and Pekin duck (A. platyrhynchos) reference genomes and HiFi reads, to correct potential misassemblies in haplotype contigs before assigning them to chromosomes via the RagTag scaffold command based on both reference genomes. Based on these, we should be able to determine which haplotypes come from which parent. Completeness of haplotype-resolved genome assemblies was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO v4.1.2, aves_odb10) software [19].

RNA-seq and genome annotation

RNA-seq. Total RNA was isolated using a standard Trizol (Invitrogen) protocol. Genomic DNA was removed using DNaseI. Three cDNA libraries for paired-end 150 bp sequencing were prepared using dUTP protocols. Libraries were sequenced on Illumina NovaSeq 6000 platform. More than 10 Gb of high quality data were obtained per library.

Gene structure prediction. Gene structure prediction was performed using a comprehensive approach that integrated three strategies. Firstly, gene structure was predicted based on evidence supported by transcriptome data from RNA-seq analysis, utilizing cDNA sequence and genome alignment in conjunction with PASA (v2.1) [20]. Secondly, homology evidence was employed by comparing protein-coding sequences of related species to the mule duck genome using Blast (v2.2.28) [21] and Genewise (v2.2.0) [22]. Lastly, ab initio predictions generated by AUGUSTUS (v3.3) [23] and SNAP (v2013-11-29) [24] were utilized for gene structure prediction as well. Finally, these results were integrated into a non-redundant and more comprehensive gene set using GETA (v2.5.4), as described in the provided URLs.

Gene function annotation. Various functional databases, including Swissprot (release 2018_12) [25], NT, NR, PFAM (v35.0) [26], eggnog (v3.0) [27], GO [28], and KEGG [29], were employed for the prediction of gene set functions.

Noncoding RNA prediction. The identification of noncoding RNAs involved the utilization of tRNAScan-SE (v1.3.1) [30] for tRNA detection, while other types of ncRNAs were identified through a search against the Rfam (v12.0) database [31].

Characterization of repetitive sequences. Characterization of repetitive sequences was performed using two integrated strategies. Firstly, homology evidence was utilized by employing RepeatMasker (v4.0.7) and Repeatproteinmask (v1.36) [32], along with the RepBase (v28.06) database [33], to predict sequences that exhibited similarity to known repetitive sequences. Secondly, an ab initio prediction approach was employed where repeat families were initially identified de novo and classified using RepeatModeler (v1.0.10). The resulting repeat library generated by RepeatModeler (v1.0.10) was further analyzed using RepeatMasker (v4.0.7) [32] to uncover additional repeats in the genome, while TRF [34] was employed for identifying tandem repeat sequences.

Assembly-driven detection of structural variation

The two haplotype-resolved genomes were aligned to their respective reference genomes by minimap2 v2.21 [35]. SVIM [36] was adopted to detect SVs, including deletions (DEL), insertions (INS), duplications (DUP) and inversions (INV). We filtered out SVs which were smaller than 50 bp. We used the lm function to perform regression analysis between chromosome length and SV number by R.

Identification of haplotype differentially expressed genes

The RNA-seq reads from muscle tissue were generated using three biological replicates. Subsequently, the reads were aligned against the coding sequence (CDS) using Bowtie (v1.3.1) [37], and only the best alignment was retained for each read. FPKM values were estimated utilizing the RSEM program (v1.3.3) [38]. To investigate differences in expression between alleles, we employed the DESeq2 package (v1.40.2) [39]. Differentially expressed genes were selected based on a fold change (FC) > 2 with Benjamini-Hochberg adjusted p-values < 0.05 as criteria for significance assessment.

Calculation of Shannon entropy

We calculated the Shannon entropy (H) using the formula described in the original paper [40] as: where Pt/g is the relative expression of a gene g in a haplotype t relative to its expression given in N haplotypes. This value has units of bits ranging from zero, indicating genes expressed in a single haplotype, to log2(N), indicating genes expressed uniformly in all haplotypes. Q value was calculated as:

It could be used to determine that gene g has specific expression in haplotype.

Analysis of haplotype specific expression genes

We defined the haplotype specific expression genes (HSEs) using the following criteria: i) the genes with Benjamini-Hochberg adjusted p-values < 0.01; ii) the Shannon entropy (H) of the genes equal to zero. Functional enrichment analyses for genes were performed by Metascape [41].

Results

Genome sequencing and assembly

We sequenced the genome of a female mule duck, an F1 hybrid with Muscovy duck as the sire (paternal contributor) and domestic duck as the dam (maternal contributor), by using the PacBio Sequel IIe platform, generating 73.10 Gb of HiFi reads (~73-fold coverage, S1 Table in S1 File). We performed de novo haplotype-resolved genome assembly using hifiasm (v0.15.2) [16] period. This yielded 729 contigs in haplotype 1 (H1, Muscovy duck genetic, with Z chromosome sequences) and 1,422 contigs in haplotype 2 (H2, Mallard genetic, with W chromosome sequences). The genome size of the final assembly of H1 and H2 was 1.28 Gb (N50 of 14.90 Mb) and 1.40Gb (N50 of 15.70 Mb), respectively (Table 1). The continuity of each mule duck haplotype was substantially higher than that of the parental origin assemblies of the Muscovy duck (C. moschata, KizCaiMos1.0) [42] and Pekin duck (A. platyrhynchos, ZJU1.0) [43], demonstrating the advantage of HiFi data in phased genome assembly (Table 1). We evaluated the completeness of our phased genomes using BUSCO [19]. H1 showed over 96.9% coverage of the embryophyte orthologous gene set, whereas H2 showed 97.3% coverage (Table 1 and S2 Table in S1 File).

The contigs were scaffolded into a chromosome-level sequence based on the reference genome sequences (Muscovy duck and Pekin duck) guided strategy by RagTag (v2.0.1) [18], resulting in 35 autosomes plus Z chromosomes in H1 and 31 autosomes plus W chromosomes in H2. We aligned the chromosome-level sequences of H1 and H2 to Muscovy duck (C. moschata) and Pekin duck (A. platyrhynchos) respectively, and found overall high similarity between them at the sequence level (Fig 1). We also aligned H1 and H2 to each other, which revealed a highly collinear relationship (S1 Fig in S1 File). In this alignment, we found that the chrZ in H1 aligned to chrW in H2, which implied that these two sex chromosomes exhibit a high degree of homology (S3 Table in S1 File). In addition, we have observed the presence of collinearity between certain other chromosomes, such as chr10, 13, 18, 19, 24, 25 (S3 Table in S1 File).

thumbnail
Fig 1.

Dot plot showing the syntenic relationships between (A) H1 and Muscovy duck, (B) H2 and Pekin duck. The X-axis represents the Muscovy duck and Pekin duck, and Y-axis represents the H1 and H2. A diagonal straight line indicates synteny among the haplotypes.

https://doi.org/10.1371/journal.pone.0305914.g001

Gene and repeat annotations

We proceeded to annotate the genome using a comprehensive strategy combining evidence-based and ab initio gene predictions (see “Materials and methods”). Based on the gene structure, 29,928 gene models were predicted in H1 with an average gene length of 18627.76 bp, whereas 31,661 gene models were predicted in H2 with an average gene length of 18211.15 bp (S4 Table in S1 File). Combining the evidence from mRNA sequencing (RNA-seq) data, homologous and ab initio gene predictions, we obtained 18,002 and 19,385 protein-coding genes in H1 and H2, respectively (Table 1), which were integrated using GETA (v2.4.14) (see URLs). Among them, functional annotation showed that 97.86% and 97.60% of the protein-coding genes in H1 and H2 matched known proteins in public databases (S5 Table in S1 File). In addition, we also identified small noncoding RNAs (ncRNAs) across the two haplotypes, including microRNA (miRNA), ribosomal RNAs (rRNAs), transfer RNAs (tRNAs) and small nuclear RNA (snRNA) (S6 Table in S1 File). The exon number, gene length, coding sequence (CDS) length, exon length, and intron length were similar across other related species (S2 Fig in S1 File).

Our assembly contains 20.44% and 22.07% of repetitive sequences in H1 and H2, respectively (Table 1). We identified 685,186 and 761,147 copy number of repeat elements in H1 and H2, among which long terminal repeat endogenous retroviruses (LTR/ERV) were abundant, making up 3.74% and 3.64% of the two haplotype genomes, and LTR/Gypsy elements were particularly plentiful, accounting for 0.63% and 1.08% of the genomic content, respectively (S7 Table in S1 File). We observed that LINE accounted for the highest proportion (5.13%) in H2, which implies the a high prevalence in mallard genome.

Characteristics of structural variation

We detected a total of 23,229 and 35,893 structural variations (SVs) in H1 and H2 genomes against their reference genomes, including 8,454 and 17,959 deletions, 14,719 and 17,759 insertions, 27 and 103 duplications, 29 and 72 inversions. Deletions and insertions were the majority type of SVs, and this was also the case on each chromosome (S3 Fig in S1 File). The H2 has more deletions, while the H1 has more insertions.

In order to explore the relationship between chromosome length and the number of structural variations, we conducted a regression analysis. In autosomes, we observed a positive correlation between the length of chromosomes and the number of SVs (Fig 2). Most of the chromosomes fell within or around the 99% confidence interval. Compared with the autosomes, the Z and W chromosomes exhibited a certain degree of deviation, especially in deletions and inversions. The Z chromosome was somewhat deficient in deletions and insertions, but the W chromosome was somewhat excessive. This phenomenon suggests that there might be differences in the sensitivity of sex chromosomes to SVs, which could be related to their functional conservation.

thumbnail
Fig 2. Regression of chromosome length and the number of structural variations (SVs).

https://doi.org/10.1371/journal.pone.0305914.g002

Haplotype specific expression genes

Given that the female mule duck was an F1 hybrid offspring, our haplotype-resolved genomes enabled the exploration of genes with allelic imbalance in expression. However, since muscle tissue in poultry is a major economic trait, leg muscle was chosen to demonstrate gene phasing to explore haplotype specific expression. We applied mRNA sequencing (RNA-seq) to the left leg muscle of three female mule ducks, and quantified the allelic expression levels of H1 and H2. We used the two haplotypes to quantify and obtain FPKM expression values of the protein-coding genes. An average of ~12,184 genes were as expressed (with an FPKM value > 0.1) [44] in each haplotype from three female mule ducks, and 14,919 genes were expressed in all six haplotypes (from three female mule ducks). We used the R package DESeq2 (v1.40.2) [39] to perform the differential expression analysis to compare the two haplotypes. Genes with a fold change (FC) > 2 and Benjamini-Hochberg adjusted p values < 0.05 were classified as differentially expressed (DE). Based on this, we found 2,567 differential expression genes between H1 and H2 (S8 Table in S1 File).

To identify the haplotype-specific expression genes (HSEs), we calculated the Shannon entropy (H) value as a measure of the specificity of gene expression across haplotypes. We found that the DEs showed increased haplotype specificity compared with none DEs (Fig 3A). We selected the HSEs (see methods), and obtained 1,451 HSEs (S9 Table in S1 File). Among them, 737 specifically expressed in H1, and 714 were specifically expressed in H2 (refer to S10 and S11 Tables in S1 File). Functional enrichment analysis showed that the HSEs were mainly enriched for phosphorylation (GO:0016310), muscle contraction (GO:0006936) and actin filament-based process (GO:0030029) (Fig 3B), which fit with the biological function of muscle tissue. Besides, we found that H1 and H2 HSEs tended to be involved in similar biological processes. HSEs in H1 were enriched for myometrial relaxation and contraction pathways (WP289) and EGF and EGFR signaling pathway (WP437) (Fig 3C). HSEs in H2 were enriched for muscle contraction (GO:0006936), actin filament-based process (GO:0030029), muscle structure development (GO:0061061) and phosphorylation (GO:0016310) (Fig 3D).

thumbnail
Fig 3. Shannon entropy (H) value between DE and non-DE genes and functional categories of HSEs.

(A) Boxplot of Shannon entropy (H) value between DE and non-DE genes. (B) Functional categories of all HSEs. (C) Functional categories of H1 specific expression genes. (D) Functional categories of H2 specific expression genes.

https://doi.org/10.1371/journal.pone.0305914.g003

Discussion

In this study, we assembled a chromosome-scale haplotype-resolved female mule duck genome based on HiFi sequencing data. Both the sequence continuity (contig N50) and genome quality (base accuracy and completeness) were substantially higher than those of the previously released duck genomes (Muscovy duck and Pekin duck). Our results indicated the advantages of HiFi reads on de novo genome assembly and provided a valuable resource for future genetic breeding programs.

Additionally, we attempted to detect structural variations (SVs) using an assembly-based approach and compared them with chromosome length [45]. Consistent with previous studies, we observed a deviation between the number of SVs in sex chromosomes and chromosome length. In the context of normal natural selection, the number of SVs should be linearly correlated with the length of the chromosome. Our results indicated that the Z chromosome had fewer SVs, while the W chromosome had more SVs. This phenomenon suggested that the Z and W chromosomes may have undergone different intensities of purifying selection.

In addition, this chromosome-scale haplotype-resolved female mule duck genome will enable the characterization of haplotype specific genes and their functions. Based on the differentially expressed genes, we added Shannon entropy (H) value to screen for genes with haplotype-specific expression (HSEs). Through the functional enrichment analysis, we discovered that HSEs were more involved in pathways related to energy metabolism and muscle contraction, which was consistent with the biological characteristics of muscle tissue. Functional analysis of H1 and H2-specific expression genes yielded similar results, further confirming the biological function of muscle.

In summary, our research provides the haplotype-resolved genome of a hybrid species, which will serve as a reference for future genomic studies on other hybrid species. However, mapping some highly repetitive regions, such as those related to the telomere and centromere, remains a challenge in mapping a diploid mule duck genome. Efforts have tried to address these challenges. For example, the first human telomere-to-telomere (T2T) genome (CHM13) [9, 46] has been constructed. Additionally, a draft of human pangenome has been published [47], which implies that a more comprehensive, diverse and accurate genomic age is forthcoming.

References

  1. 1. Huang Y, Li Y, Burt DW, Chen H, Zhang Y, Qian W, et al. The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nature Genetics. 2013;45(7):776–83. pmid:23749191
  2. 2. Kokoszynski D, Wilkanowska A, Arpasova H, Hrncar C. Comparison of some meat quality and liver characteristics in Muscovy and mule ducks. Arch Anim Breed. 2020;63(1):137–44. pmid:32494586
  3. 3. Mazurowski A, Frieske A, Wilkanowska A, Kokoszyński D, Mroczkowski S, Bernacki Z, et al. Polymorphism of prolactin gene and its association with growth and some biometrical traits in ducks. Italian Journal of Animal Science. 2016;15(2):200–6.
  4. 4. Jiang F, Jiang Y, Wang W, Xiao C, Lin R, Xie T, et al. A chromosome-level genome assembly of Cairina moschata and comparative genomic analyses. BMC genomics. 2021;22(1):581. pmid:34330207
  5. 5. Chartrin P, Bernadet M-D, Guy G, Mourot J, Hocquette J-F, Rideau N, et al. Does overfeeding enhance genotype effects on liver ability for lipogenesis and lipid secretion in ducks? Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology. 2006;145(3):390–6. pmid:16963298
  6. 6. Hermier D, Guy G, Guillaumin S, Davail S, André J-M, Hoo-Paris R. Differential channelling of liver lipids in relation to susceptibility to hepatic steatosis in two species of ducks. Comparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology. 2003;135(4):663–75. pmid:12892758
  7. 7. Massimino W, Andrieux C, Biasutti S, Davail S, Bernadet M-D, Pioche T, et al. Impacts of Embryonic Thermal Programming on the Expression of Genes Involved in Foie gras Production in Mule Ducks. Frontiers in Physiology. 2021;12. pmid:34925068
  8. 8. Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. 2019;37(10):1155–62. pmid:31406327
  9. 9. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376(6588):44–53. pmid:35357919
  10. 10. Naish M, Alonge M, Wlodzimierz P, Tock AJ, Abramson BW, Schmücker A, et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science. 2021;374(6569):eabi7489. pmid:34762468
  11. 11. Logsdon GA, Vollger MR, Hsieh P, Mao Y, Liskovykh MA, Koren S, et al. The structure, function and evolution of a complete human chromosome 8. Nature. 2021;593(7857):101–7. pmid:33828295
  12. 12. Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585(7823):79–84. pmid:32663838
  13. 13. Wang B, Yang X, Jia Y, Xu Y, Jia P, Dang N, et al. High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads. Genomics, Proteomics & Bioinformatics. 2022;20(1):4–13. pmid:34487862
  14. 14. Song JM, Xie WZ, Wang S, Guo YX, Koo DH, Kudrna D, et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol Plant. 2021;14(10):1757–67. pmid:34171480.
  15. 15. Belser C, Baurens F-C, Noel B, Martin G, Cruaud C, Istace B, et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Communications Biology. 2021;4(1):1047. pmid:34493830
  16. 16. Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods. 2021;18(2):170–5. pmid:33526886
  17. 17. Garg S, Fungtammasan A, Carroll A, Chou M, Schmitt A, Zhou X, et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nature Biotechnology. 2021;39(3):309–12. pmid:33288905
  18. 18. Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biology. 2019;20(1):224. pmid:31661016
  19. 19. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. pmid:26059717
  20. 20. Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology. 2008;9(1):R7. pmid:18190707
  21. 21. McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research. 2004;32(suppl_2):W20–W5. pmid:15215342
  22. 22. Birney E. GeneWise and Genomewise. Genome Research. 2004;14(5):988. pmid:15123596
  23. 23. Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research. 2006;34(suppl_2):W435–W9. pmid:16845043
  24. 24. Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004;5(1):59. pmid:15144565
  25. 25. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research. 2000;28(1):45–8. pmid:10592178
  26. 26. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Research. 2013;42(D1):D222–D30. pmid:24288371
  27. 27. Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, et al. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Research. 2011;40(D1):D284–D9. pmid:22096231
  28. 28. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–9. pmid:10802651
  29. 29. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for Integration and Interpretation of Large-Scale Molecular Data Sets. Nucleic Acids Research. 2011;40(Database issue):D109–14. pmid:22080510
  30. 30. Lowe TM, Eddy SR. tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence. Nucleic Acids Research. 1997;25(5):955–64. pmid:9023104
  31. 31. Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Research. 2014;43(D1):D130–D7. pmid:25392425
  32. 32. Tarailo-Graovac M, Chen N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Current Protocols in Bioinformatics. 2009;25. pmid:19274634
  33. 33. Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6(1):11. pmid:26045719
  34. 34. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research. 1999;27 2. pmid:9862982
  35. 35. Li H. New strategies to improve minimap2 alignment accuracy. Bioinformatics. 2021;37(23):4572–4. pmid:34623391
  36. 36. Heller D, Vingron M, Birol I. SVIM: structural variant identification using mapped long reads. Bioinformatics. 2019;35(17):2907–15. pmid:30668829
  37. 37. Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009;10. pmid:19261174
  38. 38. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12. pmid:21816040
  39. 39. Love M, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15(550). pmid:25516281
  40. 40. Schug J, Schuller W-P, Kappen C, Salbaum JM, Bucan M, Stoeckert CJ. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biology. 2005;6(4). pmid:15833120
  41. 41. Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications. 2019;10(1):1523. pmid:30944313
  42. 42. Xu M-M, Gu L-H, Lv W-Y, Duan S-C, Li L-W, Du Y, et al. Chromosome-level genome assembly of the Muscovy duck provides insight into fatty liver susceptibility. Genomics. 2022;114(6):110518. pmid:36347326
  43. 43. Li J, Zhang J, Liu J, Zhou Y, Cai C, Xu L, et al. A new duck genome reveals conserved and convergently evolved chromosome architectures of birds and mammals. GigaScience. 2021;10(1). pmid:33406261
  44. 44. Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, et al. The human transcriptome across tissues and individuals. Science. 2015;348(6235):660–5. pmid:25954002
  45. 45. Chen J, Zhong J, He X, Li X, Ni P, Safner T, et al. The de novo assembly of a European wild boar genome revealed unique patterns of chromosomal structural variations and segmental duplications. Animal Genetics. 2022;53(3):281–92. pmid:35238061
  46. 46. Liu Y, Du H, Li P, Shen Y, Peng H, Liu S, et al. Pan-Genome of Wild and Cultivated Soybeans. Cell. 2020;182. pmid:32553274
  47. 47. Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al. A draft human pangenome reference. Nature. 2023;617(7960):312–24. pmid:37165242