Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Whole genome sequencing revealed esophageal squamous cell carcinoma related biomarkers

  • Mingjun Li ,

    Contributed equally to this work with: Mingjun Li, Lei Li, Xizi Wang

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Radiotherapy, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Lei Li ,

    Contributed equally to this work with: Mingjun Li, Lei Li, Xizi Wang

    Roles Formal analysis, Methodology, Software, Writing – original draft

    Affiliations BGI Research, Shenzhen, China, BGI Research, Qingdao, China

  • Xizi Wang ,

    Contributed equally to this work with: Mingjun Li, Lei Li, Xizi Wang

    Roles Formal analysis, Software, Writing – original draft

    Affiliation Joint Laboratory for Translational Medicine Research, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Yanwei Zhao,

    Roles Data curation, Investigation

    Affiliation Department of Radiotherapy, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Peina Du,

    Roles Formal analysis

    Affiliations BGI Genomics, Shenzhen, China, Clin Lab, BGI Genomics, Qingdao, China

  • Wei Wang,

    Roles Investigation

    Affiliation Department of Thoracic Surgery, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Zhenxing Wang,

    Roles Investigation

    Affiliation Department of Thoracic Surgery, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Yadong Wang,

    Roles Investigation

    Affiliation Department of Thoracic Surgery, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Yanxing Sheng,

    Roles Methodology, Supervision

    Affiliation Department of Radiotherapy, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Mingliang Gu,

    Roles Project administration, Supervision

    Affiliation Joint Laboratory for Translational Medicine Research, Liaocheng People’s Hospital, Liaocheng, Shandong, China

  • Xiaodong Jia

    Roles Conceptualization, Data curation, Methodology, Supervision, Writing – review & editing

    jiaxiaodong1018@163.com

    Affiliation Joint Laboratory for Translational Medicine Research, Liaocheng People’s Hospital, Liaocheng, Shandong, China

Abstract

Esophageal squamous cell carcinoma (ESCC) is among the most frequently diagnosed cancer types, and affected patients frequently experience poor prognostic outcomes and high mortality rates. Many genomic studies of ESCC have been performed in recent years, yet the mutational mechanisms driving ESCC and their clinical implications remain incompletely understood. In this study, paired tumor and normal tissue samples from 22 patients with ESCC were used for whole genome sequencing-based analyses of genome-wide mutational events. These comprehensive analyses enabled the detection and characterization of various mutation subtypes in ESCC including somatic single-nucleotide variants, small insertions and deletions, copy number variations, structural variations, and circular extrachromosomal DNA. Of identified genes harboring non-silent mutations, TP53, NOTCH1, CSMD3, EP300, and FAM135B were the most frequently mutated genes in this study and they were annotated in the COSMIC Cancer Gene Census. With the exception of aging-related signatures, an APOBEC-associated mutational signature was the dominant mutational feature detected in ESCC samples, suggesting that APOBEC-mediated cytidine deamination is likely a major driver of mutations in this cancer type. Notably, our study also detected circular extrachromosomal DNA (ecDNA) events in these ESCC patient samples. The oncogenes COX6C, PVT1, and MMP12 as well as the oncogenic long non-coding RNA AZIN1-AS1 which were detected in ecDNA regions in these analyses may be associated with worse disease-free survival in ESCC patients.

Introduction

Esophageal cancer (EC) is the seventh most prevalent cancer type in the world, and over 90% of EC patients are diagnosed with esophageal squamous cell carcinoma (ESCC), which is a highly aggressive malignancy that is particularly common in China [1]. Within China, the incidence of ESCC exhibits marked geographic variability, with the highest rates in Taihang Mountain, Xinjiang province, and Chaoshan district, while incidence rates in other regions are rising rapidly [2]. At the global level, ESCC primarily impacts individuals in low- and middle-income nations, suggesting that exogenous exposures may contribute to the risk of ESCC development [3].

A range of lifestyle and environmental factors have been linked to the risk of ESCC [4]. There is strong evidence that tobacco use and alcohol abuse can synergistically contribute to ESCC incidence in lower-risk areas [4,5], while possible risk factors in China may include dietary habits including the consumption of hot food and betel nut chewing [6]. ESCC risk has also been linked to environmental factors including exposure to polycyclic aromatic hydrocarbons. While risk factors tied to the development of ESCC are increasingly well understood, these factors alone are not sufficient to account for the regional and national variations in ESCC incidence that have been observed to date [3].

ESCC is a tumor type characterized by poor prognostic outcomes owing to the limited clinical tools available for its early diagnosis [7]. Given its high degree of heterogeneity and the fact that clinical outcomes in patients tend to be variable, there are also few reliable prognostic biomarkers available for ESCC [8]. Understanding of the molecular mechanisms that govern ESCC progression and development is also relatively limited, underscoring the need to further define oncogenic changes and diagnostic biomarkers associated with the development of the devastating cancer type.

In this study, whole-genome sequencing (WGS) was performed for tumors and paired normal tissue samples from 22 patients with ESCC for whom corresponding clinical information was available. The resultant genomic data were analyzed to detect ESCC-related somatic mutations, copy number alterations (CNAs), and structural variations (SVs). In addition, key mutational signatures were analyzed, and circular extrachromosomal DNA (ecDNA)-related genes were examined in an effort to shed further light on the underlying molecular drivers of ESCC development.

Results

Patient characteristics

This study enrolled 22 ESCC patients (17 male, 5 female). Detailed clinical characteristics for these patients are summarized in S1 Table. Of these patients, 9 (40.9%) had a history of alcohol abuse and 12 (54.5%) had a history of smoking. Additionally, 1, 6, 12, and 3 of these patients were diagnosed with stage I, II, III, and IV tumors, respectively, and 16, 1, 4, and 1 patients were classified as having moderately differentiated, moderately or poorly differentiated, poorly differentiated, and 1 well differentiated tumors, respectively. The distribution of ESCC tumor locations in this study included upper thoracic (n = 1), middle thoracic (n = 9), and lower thoracic (n = 12) tumors (S1 Table).

WGS was performed for 22 tumors and matched normal tissues from ESCC patients, with respective average sequencing depths of 44.09x and 42.55x. Among these patients, one patient (ESCC22) was identified as having hypermutated ESCC, as evidenced by the presence of > 10 somatic mutations per megabase of analyzed genomic sequence (mutation burdern) (Fig 1A). This hypermutated patient was not included in subsequent analyses.

thumbnail
Fig 1. An overview of the somatic mutational profile of ESCC.

(A) Somatic mutations per megabase of analyzed genomic sequence (mutation burden) in 22 ESCC patient tumor tissue samples. (B, C) Types (B) and classifications (C) of non-silent variants in non-hypermutated ESCC patients. (D) Non-silent Variants per sample in non-hypermutated ESCC patients. (E) Oncoplot for the top 30 mutated genes identified in patients with non-hypermutated ESCC.

https://doi.org/10.1371/journal.pone.0323915.g001

The mutational landscape of ESCC

Short somatic mutations are among the most common mutation types, and were detected using Mutect2 in the present study. In total, 151,148 short somatic mutations were identified in these non-hypermutated ESCC patient samples, including 136,763 single nucleotide variants (SNVs), 948 multiple nucleotide polymorphisms (MNPs), and 13,437 insertions/deletions (INDELs) (4,060 insertions, 9,377 deletions) (S2 Table). Of these mutations, 1,304 were located in exonic or splicing regions (311 silent variants, 982 non-silent variants, and 11 unknown). The results of the 982 non-silent variants were summarized and visualized by R maftools and ComplexHeatmap packages (Fig 1B - E). There is an average of 47 non-silent mutations per tumor sample (range: 1–149) including 932 SNVs, 13 insertions, and 37 deletions (Fig 1B, D). And these somatic mutations were functionally annotated into missense mutations, nonsense mutations, frameshift deletions, splice site variants, frameshift insertions, in-frame deletions, translation start site (start-loss mutations), in-frame insertions, and nonstop mutations (stop-loss mutations) (Fig 1C). The mutated genes that were altered in at least 2 samples were mostly enriched in WNT and Notch signaling pathway using gene set enrichment analysis (S3 Table). In order to evaluate the protein interactive relationships among these genes, a PPI network was constructed (S1 Fig), and nodes that showed high scores in the network were screened as hub genes. Following STRING analysis and the MCC score, TP53, PIK3CA, and NOTCH1 were found to play important roles. The most frequently mutated genes that were altered in at least three ESCC patient tumor samples included TP53, NOTCH1, CSMD3, EP300, FAM135B, KCNH7, and ZNF572 (Fig 1E).

Notably, of these genes, TP53, NOTCH1, CSMD3, EP300, and FAM135B were all annotated in the COSMIC Cancer Gene Census, and the majority were impacted by missense or nonsense mutations in these ESCC patient samples (Fig 1E). The gene expressions of TP53, NOTCH1, EP300 and FAM135B in primary tumor were different from solid tissue normal in The Cancer Genome Atlas Esophageal Carcinoma (TCGA-ESCA) cohort (S2 Fig, p < 0.05). And TP53, CSMD3 and NOTCH1 were also the most frequently mutated in TCGA-ESCA dataset (S3A Fig). The mutation burden identified by our study was lower than data from TCGA-ESCA (S3B Fig), probably due to the difference of sequencing method and sample size.

Identification of ESCC-related mutational signatures

Three SBS mutation signatures (SBS96A, SBS96B, and SBS96C) were identified based on the SBS in 96-element form, and these were composed of the COSMIC SBS1, SBS2, SBS3, SBS5, SBS13 and SBS18 mutational signatures (Fig 2). SBS96A was composed of the SBS5 (59.78%), SBS18 (22.98%), and SBS1 (17.24%) signatures (Fig 2A), whereas SBS96B was primarily composed of SBS3 (74.76%), SBS5 (22.06%), and SBS1 (3.18%) (Fig 2B). In addition, SBS96C mainly consisted of SBS13 (48.7%), SBS2 (29.9%), SBS5 (20.4%), and SBS1 (1.0%) (Fig 2C). Among these annotated mutational signatures, the majority of variants were associated with SBS5 (21/21) and SBS1 (21/21), while a subset of variants were associated with SBS2 (17/21), SBS13 (17/21), SBS18 (11/21), and SBS3 (3/21) (Fig 2D).

thumbnail
Fig 2. COSMIC Single base substitution signatures identified in non-hypermutated ESCC.

(A) De novo identified signature SBS96A is matched to a combination of COSMIC signatures SBS5, SBS18 and SBS1. (B) De novo identified signature SBS96B is matched to a combination of COSMIC signatures SBS3, SBS5 and SBS1. (C) De novo identified signature SBS96C is matched to a combination of COSMIC signatures SBS13, SBS2, SBS5 and SBS1. (D) Frequency of decomposed COSMIC signatures in each sample.

https://doi.org/10.1371/journal.pone.0323915.g002

Identification of copy number variations and structural variations

For whole-genome analyses of somatic CNAs in ESCC tumor tissue samples, the frequency of somatic copy number gains and losses was examined at the cohort level (Fig 3). The regions that showed the most amplifications were located in chromosome arms 3q and 5p, while those with the most deletions where located in 3p (Fig 3A). The length of somatic CNAs for each sample pair in this cohort is shown in Fig 3B. In the 21 analyzed paired tumor and normal tissue samples, the average CNA length was about 714.05 Mb (range: 10.22 Mb – 1735.70 Mb), and the CNA length of ESCC18 and ESCC10 was more than 1500 Mb (Fig 3B). For further details regarding CNAs in this patient cohort, see S4 Table. Our study also identified a higher frequency of CNAs in chromosome 3q in TCGA-ESCA (S4A Fig). The total length of the genome segments affected by either gains or losses in patients with esophageal cancer in our ESCC dataset was significantly higher than data from TCGA-ESCA (S4B Fig).

thumbnail
Fig 3. Copy number variation and structural variation classification.

(A) Copy number gain and loss proportions in patients with non-hypermutated ESCC. (B) CNV lengths per sample. (C) The per-sample frequencies of different SV types.

https://doi.org/10.1371/journal.pone.0323915.g003

Somatic SVs were also classified for these 21 ESCC patients. In total, 1,574 somatic SVs were detected, including 728 breakends, 530 deletions, 311 tandem duplications, and 5 insertions (S5 Table and Fig 3C). Based on the associations among chromosomes, these 728 breakends were further characterized as 381 and 347 intrachromosomal and interchromosomal breakends, respectively (Fig 3C). In addition, 493 SV events (31.32%) were predicted to impact exonic or splicing regions (275 deletions, 186 tandem duplications, and 32 breakends).

Circular extrachromosomal DNA analyses

Computational methods were next used to analyze these WGS data in an effort to detect focal amplifications as a means of studying the ecDNA landscape of ESCC. Amplicons were defined as sets of connected genomic intervals with copy number amplifications, and amplicon structures were defined as ordered lists of segments from the amplicon intervals [9].

In total, 29 amplicons were identified in 13 patients that could be classified as ecDNA, breakage-fusion-bridge (BFB), complex non-cyclic, and linear amplifications. Specifically, ecDNA was detected in patients ESCC05, ESCC06, ESCC10, and ESCC21 (S6 Table and Fig 4). As shown in S6 Table, these detected ecDNAs were annotated to genes DYNLL2, EPX, COX6C, FBXO43, AZIN1, AZIN1-AS1, ANGPTL5, BIRC2, NBPF22P and et al. In particular, SRSF1, COX6C, MYC, PVT1, BIRC2, BIRC3, MMP12, and YAP1 were identified as ecDNA-related oncogenes (Fig 4A-D). Amplification events can induce oncogene activation, and oncogene expression is a key driver of tumorigenesis [1013]. Significantly, most of the oncogenes within ecDNA had higher level expression in primary tumor compared to solid tissue normal in TCGA-ESCA (S5 Fig), which might indicate the importance of oncogenes within ecDNA in tumorigenesis.

thumbnail
Fig 4. Rearrangement signature visualization for each amplicon containing ecDNA detected in patients with ESCC.

(A-E) The SV view for different samples. Amplicon intervals are shown on the X-axis, while the depth of coverage for these intervals is indicated by the left Y-axis and vertical grey bars, and copy number estimations are indicated by the right Y-axis and black horizontal lines. Arcs represent discordant read pair clusters, and are color-coded as follows for read mapping orientation: red indicates length discordant in expected orientation (forward-reverse), brown indicates everted read pairs (reverse-forward), teal indicates that both reads map to the forward orientation, and magenta indicates that both reads map to the reverse orientation. Connections to the source vertex are represented using blue vertical lines. Oncogene annotations are shown in the bottom panel [9].(F) ecDNA genes enrichment analysis results.

https://doi.org/10.1371/journal.pone.0323915.g004

Gene set enrichment analyses were additionally performed as a means of exploring the potential functions of these ecDNA genes. The top enriched terms associated with these genes included the following: collagen catabolic process, collagen metabolic process and extracellular matrix disassembly (Fig 4F). Possible relationships between ecDNA genes and disease-free survival (DFS) were also explored for patients in TCGA-ESCA database, revealing that patients expressing higher levels of COX6C (HR = 2.06, Logrank p = 0.0041), AZIN1-AS1 (HR = 1.92, Logrank p = 0.016), MMP12 (HR = 2.04, Logrank p = 0.0052), and PVT1 (HR = 1.60, Logrank p = 0.058) exhibited a shorter DFS duration (Fig 5A-D). High expression of AZIN1-AS1, COX6C, MMP12 and PVT1 might be associated with a worse survival outcome both in TCGA ESCC and esophageal adenocarcinoma (EAC) datasets (S6 Fig). And high expression of them significantly associated with poor disease-free survival (HR = 2.17, p-value = 0.003) after adjustment for age, gender and stage (S7 Table).

thumbnail
Fig 5. Survival analyses for ecDNA-associated genes in the TCGA-ESCA dataset.

(A-D) TCGA-ESCA dataset (n = 176) was used to generate disease-free survival plots for patients grouped based on high and low expression values for the indicated genes.

https://doi.org/10.1371/journal.pone.0323915.g005

Discussion

In the present study, a WGS approach was used to facilitate the comprehensive characterization of the genomic characteristics of ESCC. Mutational landscapes were examined for 21 patients with non-hypermutated ESCC, with a focus on SNVs and INDELs. In total, the most frequently mutated genes including CSMD3, EP300, FAM135B, NOTCH1, and TP53 were identified in ESCC, consistent with previous studies in recent years [8,1416]. These 5 genes were annotated in the COSMIC Cancer Gene Census and validated by TCGA-ESCA, which indicated these genes especially TP53 and NOTCH1 might play crucial roles in the tumorigenesis of ESCC. In addition, six COSMIC Mutational Signatures (SBS1, SBS2, SBS3, SBS5, SBS13, and SBS18) were detected in these analyses. And an average of approximately 714.05 Mb genome length per tumor samples harbored copy number gains (415.42 Mb) or losses (298.63 Mb). In total, 1,574 somatic SVs were detected in this ESCC cohort. Finally, 5 ecDNA events that included 8 ecDNA-related oncogenes in 4 patients were identified. High levels of the ecDNA-related genes COX6C, AZIN1-AS1, PVT1, and MMP12 were also found to be associated with poorer DFS among patients in the TCGA-ESCA dataset.

Mutational signatures are invaluable tools that can aid efforts to understand the biological basis for oncogenesis [17], and several mutational signatures were successfully extracted in the present study. The SBS5 and SBS1 mutational signatures are correlated with aging-related factors [18], which was the predominant mutational process associated with ESCC incidence in this study cohort. SBS5 has also been shown to be correlated with tobacco smoking [19]. SBS2 and SBS13 have been identified as APOBEC (apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like)-related mutational signatures [20], while SBS18 was characterized as a reactive oxygen species-induced defective base excision repair-related signature [19,21]. The SBS3 COSMIC signature was also related to DNA damage repair [22,23]. SBS1 and SBS5 were detected in 21 ESCC patients in this study cohort, while SBS2 and SBS13 were evident in over half of these patients. This suggests that with the exception of age-related factors, APOBEC-related mutational signatures may be a key factor associated with ESCC development and progression. In a previous study of ESCC whole exon sequencing in a Chinese population, APOBEC-associated mutation signatures (including COSMIC signature 2 and 13) were found to be significantly enriched in some patients, and patients subtypes carrying APOBEC signatures showed higher overall survival [24]. A positive APOBEC mutation profile in patients with ovarian clear cell carcinoma is associated with higher lymphocyte infiltration and better prognosis [25]. The APOBEC mutational signature can also be used as a potential predictor of immunotherapy response in NSCLC [26]. In addition, the related mutational signature of other APOBEC family members also play a role in different cancers, for example, APOBEC3s can produce mutational signature 2 and 13 in COSMIC under the wrong replication and repair pathways, and is involved in cervical cancer, head and neck squamous cell carcinoma, and bladder cancer [27]. This is consistent with prior results demonstrating that APOBEC activation is essential for ESCC development [3].

Circular extrachromosomal DNA enables accelerated tumor evolution through mechanisms distinct from traditional chromosomal inheritance, contributing to high levels of oncogene amplification, intratumoral heterogeneity, and therapeutic resistance [2830]. Prior research has suggested that ecDNA may be a contributor to the amplification of various oncogenes and immunomodulatory genes in the context of esophageal adenocarcinoma development [31]. Here, 8 ecDNA-related oncogenes were identified (SRSF1, COX6C, MYC, PVT1, BIRC2, BIRC3, MMP12, and YAP1), of which two (MYC and BIRC2) are candidate ecDNA-related oncogenes previously reported in studies of ESCC conducted by Cui et al. [32]. Research focused on Barrett’s esophagus and esophageal adenocarcinoma have also suggested that BIRC2, BIRC3, MMP12, MYC, PVT1, and YAP1 may be ecDNA-related oncogenes [31]. Kim et al. [33] further found that patients who suffer from ecDNA events tend to exhibit poor survival outcomes across a variety of cancer types. In the present study, ESCC patients in the TCGA-ESCA cohort with tumors overexpressing COX6C, AZIN1-AS1, PVT1, and MMP12 might to be associated with poor disease-free survival in TCGA-ESCA patients. COX6C, PVT1 and MMP12 were identified as oncogene. Notably, AZIN1-AS1 has been shown to serve as a novel oncogenic long non-coding RNA capable of promoting non-small cell lung cancer progression [34].To the best of our knowledge, there were limited ESCC WGS studies focused on ecDNA. Significantly, our work found that COX6C, PVT1, MMP12 and AZIN1-AS1 within ecDNA might be associated with poor survival. And most of ecDNA-related oncogenes identified in our study had higher level expression in primary tumor compared to solid tissue normal in TCGA-ESCA, which might indicate the importance of oncogenes within ecDNA in tumorigenesis of ESCC and provides insights into the selection of biomarkers of different type of esophageal cancer.

In summary, the present results highlight the mutational landscape of ESCC, providing novel insight into the genomic changes associated with this cancer type. The present results may aid in the future investigation of biomarkers that can guide the diagnosis of ESCC of the identification of new targets amenable to therapeutic intervention. However, this study is subject to certain limitations, such as the limitation of sample size and sequencing depth, and future large-scale studies and clinical analyses focused on patient survival outcomes will be essential for the effective experimental validation of these findings.

Materials and methods

Patients and sample collection

The Ethics Committee of Liaocheng People’s Hospital approved this study, and all patients provided informed consent prior to sample collection. Paired primary tumor and normal tissue samples were harvested from 22 ESCC patients in the Liaocheng People’s Hospital between August 2018 and March 2019. All pathological diagnoses were confirmed by pathologists in accordance with WHO criteria, and the TNM staging system established by the American Joint Committee on Cancer was used for tumor staging. Detailed patient characteristics are presented in S1 Table. The written informed consent was obtained from the participating subject. The experimental procedures conformed to the guidelines approved by the Ethics Committee of Liaocheng People’s Hospital and The Institutional Review Board BGI (BGI-IRB) on bioethics and biosafety. This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of Liaocheng People’s Hospital (2018043, 17 May 2018).

Whole genome sequencing

The QIAamp DNA Mini Kit (Qiagen, Germany) was used based on provided directions to extract genomic DNA from patient tumor and normal tissue samples. DNA sample concentrations were measured with a Qubit Fluorometer (Thermo Fisher Scientific, USA), and DNA quality was analyzed via agarose gel electrophoresis. WGS library construction was performed as in prior studies [35], after which 100 bp paired-end sequencing was performed with a DIPSEQ-T1 sequencer (MGI, China).

Data processing and reads mapping

SOAPnuke [36] v2.0 was used to filter the raw reads, removing any reads containing >10% N bases, reads containing sequence adaptors, and any low-quality reads (quality < 5) with > 50% in one read. Clean reads were then aligned to the hg38 human reference genome using the BWA-MEM [37] v0.7.17-r1188 algorithm with default parameters, after which aligned reads were sorted with Picard [38] v2.18.27, and PCR duplicates were marked. The Genomic Analysis Toolkit (GATK) [39] v3.8-1 was then used for base quality score recalibration and local realignment as described in the GATK best practice guidelines [40], and the GATK ContEst [41] module was used to delete samples with a contamination level > 1%. Tumor and normal samples from the same biological source were then matched with bam-matcher [42], and high-quality alignment results were used to call somatic variations in downstream analyses.

Somatic mutation and mutation signature identification

Somatic SNV and INDEL detection was performed with the Genome Analysis Toolkit (GATK) [39] v4.2.6.1 Mutect2 and FilterMutectCalls with default parameters. Initially, the FilterMutectCalls function was used to filter all somatic mutations detected by Mutect2, after which those somatic SNVs marked with ‘PASS’ by FilterMutectCalls were filtered to identify those variants with a sequencing depth ≥ 10, ≥ 4 supported reads in tumors, and ≤ 2 supported reads in paired normal samples. Somatic INDELs marked with ‘PASS’ by FilterMutectCalls were additionally filtered to identify those variants with a sequencing depth ≥ 10, ≥ 5 supported reads in tumors, and ≤ 1 supported reads in paired normal samples. Identified somatic SNVs and INDELs were then annotated using ANNOVAR with refGene [43]. Finally, maftools [44] was used to converts and visualize the results of annovar annotations. Meanwhile, somatic mutations (Masked Somatic Mutation) files of 179 primary tumors were also downloaded and processed from GDC TCGA-ESCA dataset using TCGAbiolinks [45] and maftools [44]. We used 38 Mb as the estimate of the exome size, and mutation burdern of each sample is equal to the total mutations/38 in TCGA-ESCA.

Mutational signatures were estimated using SigProfiler. The de novo extraction of mutational signatures from VCF files was performed using SigProfilerExtractor [46] v1.1.20, and COSMIC [47] signatures were used to annotate these de novo extracted signatures. Single base substitution (SBS) signatures identified using 96 different contexts were used for these analyses.

Detection of copy number variations and structural variations

Somatic copy number alterations and tumor ploidy were estimated for each sample using Sequenza [48] v3.0.0. First, we used the mpileup function from SAMtools [49]v1.9 to convert BAM files into Pileup format, filtering for base quality ≥ 20 and mapping quality ≥ 20. Second, paired tumor and normal Pileup files were processed using the bam2seqz and seqz_binning modules from sequenza-utils with default parameters. The output from sequenza-utils was then further analyzed using the Sequenza R package to generate segmented copy number data. Copy number data for each segment obtained from Sequenza (except chrX and chrY) were divided by mean sample ploidy (CNt/Ploidy, S3 Table). The cnFreq function in the GenVisR [50] v1.29.3 R package was then used for the identification of copy number gains (2.5/2) or losses (1.5/2). Manta [51] v1.6.0 was used to call structural variations, and somatic SVs marked with ‘PASS’ by Manta when using the default parameters were retained for further analysis. Then, the Manta VCF files were converted to BEDPE formats using svtools [52]. All identified CNAs and SVs were subjected to refGene annotation using ANNOVAR [43]. Meanwhile, copy number (Masked Copy Number Segment) files of 179 primary tumors were also downloaded from GDC TCGA-ESCA dataset using TCGAbiolinks [45].

Circular extrachromosomal DNA analysis

Circular extrachromosomal DNA detection was performed using the AmpliconArchitect [9] v1.3.r3 and the CNVkit [53] v0.9.9 following the AmpliconSuite-pipeline (https://github.com/AmpliconSuite/AmpliconSuite-pipeline). WGS data were used as inputs for AmpliconArchitect, which explores sources of focal amplification including circular ecDNA and breakage-fusion-bridge cycles in cancer-associated genomes. Outputs from this tool were classified using AmpliconClassifier [31] v0.4.13, and all analyses were performed based on the PrepareAA v0.1344.4 pipeline’s --run_AA and --run_AC functions with default parameters.

Enrichment analysis and PPI network construction

Gene functional enrichment analyses were performed with the Gene Set Enrichment Analysis (GSEA) web server [54,55], and included analyses of Gene Ontology (GO) gene sets including the GO biological process, cellular component, and molecular function sets from the Human Molecular Signatures Database (MSigDB) [5557]. The protein-protein interaction (PPI) network analysis was performed using the STRING database (Search Tool for the Retrieval of Interacting Genes/Proteins) [58] v12.0, an online tool and database of protein-protein interaction. A minimum required interaction score > 0.4 were selected and reconstructed in Cytoscape [59]v3.10.3, and cytoHubba plugin [60]v0.1 was used to find hub genes in PPI network. The top three genes with the highest prediction scores calculated by the Maximal Clique Centrality (MCC) algorithm were defined as the hub genes.

Survival analysis

R packages survival and survminer were used to conduct survival analyses [61,62], with a focus on disease-free survival in the TCGA-ESCA dataset, using surv_cutpoint functions determine the optimal cutpoint as the cut-off for patient stratification. To further investigate the results, expression information (UCSC Toil RNA-seq Recompute) of 176 primary tumor samples and corresponding clinical data (TCGA Esophageal Cancer) including age, gender, survival time and pathologic stage were download from UCSC Xena project. R package survival [62] was used to perform multivariate cox regression analysis. All analyses are based on publicly published software, and the corresponding authors can be contacted for any additional data code requirements.

Consent to publish

All authors have their consent to publish their work.

Supporting information

S1 Table. Clinical and pathological features for 22 ESCC patients.

https://doi.org/10.1371/journal.pone.0323915.s001

(XLSX)

S2 Table. Details regarding somatic SNVs and INDELs.

https://doi.org/10.1371/journal.pone.0323915.s002

(XLSX)

S4 Table. ESCC patients copy number variation data.

https://doi.org/10.1371/journal.pone.0323915.s004

(XLSX)

S5 Table. Structural variations detected in ESCC patients.

https://doi.org/10.1371/journal.pone.0323915.s005

(XLSX)

S6 Table. Focal amplification events detected in ESCC patients.

https://doi.org/10.1371/journal.pone.0323915.s006

(XLSX)

S7 Table. Multivariate Cox regression analysis for disease-free survival in TCGA-ESCA.

https://doi.org/10.1371/journal.pone.0323915.s007

(XLSX)

S1 Fig. PPI network analysis and hub genes screen.

The hub gene nodes were highlighted in red.

https://doi.org/10.1371/journal.pone.0323915.s008

(TIF)

S2 Fig. Comparison of expression values of top mutated genes between primary tumor and solid tissue normal in TCGA-ESCA.

Box plots expression values of TP53, NOTCH1, EP300, FAM135B and CSMD3 by different group.

https://doi.org/10.1371/journal.pone.0323915.s009

(TIF)

S3 Fig. Comparison of somatic mutations between ESCC and TCGA-ESCA.

(A) Oncoplot for the top 30 mutated genes identified in patients with non-hypermutated TCGA-ESCA. (B) Comparison of somatic mutations per megabase of analyzed genomic sequence (mutation burden) between our ESCC study and TCGA-ESCA.

https://doi.org/10.1371/journal.pone.0323915.s010

(TIF)

S4 Fig. Comparison of copy number variations between ESCC and TCGA-ESCA.

(A) Copy number gain and loss proportions in patients with non-hypermutated TCGA-ESCA. (B) Comparison of lenth of total copy number gain/loss between our ESCC study and TCGA-ESCA.

https://doi.org/10.1371/journal.pone.0323915.s011

(TIF)

S5 Fig. Comparison of expression values of ecDNA-related oncogenes between primary tumor and solid tissue normal in TCGA-ESCA.

Box plots expression values of MYC,SRSF1,MMP12,PVT1,BIRC2,COX6C,BIRC3 and YAP1 by different group.

https://doi.org/10.1371/journal.pone.0323915.s012

(TIF)

S6 Fig. Survival analyses for ecDNA-associated genes in the TCGA-ESCA dataset.

TCGA ESCC (n = 91) and TCGA EAC datasets (n = 85) were used to generate disease-free survival plots.

https://doi.org/10.1371/journal.pone.0323915.s013

(TIF)

Acknowledgments

We are thankful to the proband and all the family members for participating in our study. And we are thankful to the China National GeneBank.

References

  1. 1. Rustgi AK, El-Serag HB. Esophageal carcinoma. N Engl J Med. 2014;371(26):2499–509. pmid:25539106
  2. 2. Lin Y, Totsuka Y, Shan B, Wang C, Wei W, Qiao Y, et al. Esophageal cancer in high-risk areas of China: research progress and challenges. Ann Epidemiol. 2017;27(3):215–21. pmid:28007352
  3. 3. Moody S, Senkin S, Islam SMA, Wang J, Nasrollahzadeh D, Cortez Cardoso Penha R, et al. Mutational signatures in esophageal squamous cell carcinoma from eight countries with varying incidence. Nat Genet. 2021;53(11):1553–63. pmid:34663923
  4. 4. Abnet CC, Arnold M, Wei W-Q. Epidemiology of Esophageal Squamous Cell Carcinoma. Gastroenterology. 2018;154(2):360–73. pmid:28823862
  5. 5. Prabhu A, Obi KO, Rubenstein JH. The synergistic effects of alcohol and tobacco consumption on the risk of esophageal squamous cell carcinoma: a meta-analysis. Am J Gastroenterol. 2014;109(6):822–7. pmid:24751582
  6. 6. Engel LS, Chow W-H, Vaughan TL, Gammon MD, Risch HA, Stanford JL, et al. Population attributable risks of esophageal and gastric cancers. J Natl Cancer Inst. 2003;95(18):1404–13. pmid:13130116
  7. 7. Pennathur A, Gibson MK, Jobe BA, Luketich JD. Oesophageal carcinoma. Lancet. 2013;381(9864):400–12. pmid:23374478
  8. 8. Lin D-C, Wang M-R, Koeffler HP. Genomic and Epigenomic Aberrations in Esophageal Squamous Cell Carcinoma and Implications for Patients. Gastroenterology. 2018;154(2):374–89. pmid:28757263
  9. 9. Deshpande V, Luebeck J, Nguyen N-PD, Bakhtiari M, Turner KM, Schwab R, et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat Commun. 2019;10(1):392. pmid:30674876
  10. 10. Liao Z, Jiang W, Ye L, Li T, Yu X, Liu L. Classification of extrachromosomal circular DNA with a focus on the role of extrachromosomal DNA (ecDNA) in tumor heterogeneity and progression. Biochim Biophys Acta Rev Cancer. 2020;1874(1):188392. pmid:32735964
  11. 11. Albertson DG, Collins C, McCormick F, Gray JW. Chromosome aberrations in solid tumors. Nat Genet. 2003;34(4):369–76. pmid:12923544
  12. 12. Anca B, Iulia VI, Oana P, Adriana P, Dana M, Irina H, et al. Mechanisms of Oncogene Activation. In: Dmitry B, Editor. New Aspects in Molecular and Cellular Mechanisms of Human Carcinogenesis. Rijeka: IntechOpen; 2016. p. Ch. 1.
  13. 13. Bagci O, Kurtgöz S. Amplification of Cellular Oncogenes in Solid Tumors. N Am J Med Sci. 2015;7(8):341–6. pmid:26417556
  14. 14. Kim J, Bowlby R, Mungall AJ, Robertson AG, Odze RD, Cherniack AD, et al. Integrated genomic characterization of oesophageal carcinoma. Nature. 2017;541(7636):169–75. pmid:28052061
  15. 15. Mangalaparthi KK, Patel K, Khan AA, Manoharan M, Karunakaran C, Murugan S, et al. Mutational Landscape of Esophageal Squamous Cell Carcinoma in an Indian Cohort. Front Oncol. 2020;10:1457. pmid:32974170
  16. 16. Cui Y, Chen H, Xi R, Cui H, Zhao Y, Xu E, et al. Whole-genome sequencing of 508 patients identifies key molecular features associated with poor prognosis in esophageal squamous cell carcinoma. Cell Res. 2020;30(10):902–13. pmid:32398863
  17. 17. Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational signatures in human cancers. Nat Rev Genet. 2014;15(9):585–98. pmid:24981601
  18. 18. Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578(7793):94–101. pmid:32025018
  19. 19. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500(7463):415–21. pmid:23945592
  20. 20. Rustad EH, Nadeu F, Angelopoulos N, Ziccheddu B, Bolli N, Puente XS, et al. mmsig: a fitting approach to accurately identify somatic mutational signatures in hematological malignancies. Commun Biol. 2021;4(1):424. pmid:33782531
  21. 21. Jin S-G, Meng Y, Johnson J, Szabó PE, Pfeifer GP. Concordance of hydrogen peroxide-induced 8-oxo-guanine patterns with two cancer mutation signatures of upper GI tract tumors. Sci Adv. 2022;8(22):eabn3815. pmid:35658030
  22. 22. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149(5):979–93. pmid:22608084
  23. 23. Chen D, Gervai JZ, Póti Á, Németh E, Szeltner Z, Szikriszt B, et al. BRCA1 deficiency specific base substitution mutagenesis is dependent on translesion synthesis and regulated by 53BP1. Nat Commun. 2022;13(1):226. pmid:35017534
  24. 24. Guo J, Huang J, Zhou Y, Zhou Y, Yu L, Li H, et al. Germline and somatic variations influence the somatic mutational signatures of esophageal squamous cell carcinomas in a Chinese population. BMC Genomics. 2018;19(1):538. pmid:30012096
  25. 25. Long X, Lu H, Cai M-C, Zang J, Zhang Z, Wu J, et al. APOBEC3B stratifies ovarian clear cell carcinoma with distinct immunophenotype and prognosis. Br J Cancer. 2023;128(11):2054–62. pmid:36997661
  26. 26. Wang S, Jia M, He Z, Liu X-S. APOBEC3B and APOBEC mutational signature as potential predictive markers for immunotherapy response in non-small cell lung cancer. Oncogene. 2018;37(29):3924–36. pmid:29695832
  27. 27. Butler K, Banday AR. APOBEC3-mediated mutagenesis in cancer: causes, clinical significance and therapeutic potential. J Hematol Oncol. 2023;16(1):31. pmid:36978147
  28. 28. Bailey C, Shoura MJ, Mischel PS, Swanton C. Extrachromosomal DNA-relieving heredity constraints, accelerating tumour evolution. Ann Oncol. 2020;31(7):884–93. pmid:32275948
  29. 29. Wu S, Bafna V, Mischel PS. Extrachromosomal DNA (ecDNA) in cancer pathogenesis. Curr Opin Genet Dev. 2021;66:78–82. pmid:33477016
  30. 30. Pecorino LT, Verhaak RGW, Henssen A, Mischel PS. Extrachromosomal DNA (ecDNA): an origin of tumor heterogeneity, genomic remodeling, and drug resistance. Biochem Soc Trans. 2022;50(6):1911–20. pmid:36355400
  31. 31. Luebeck J, Ng AWT, Galipeau PC, Li X, Sanchez CA, Katz-Summercorn AC, et al. Extrachromosomal DNA in the cancerous transformation of Barrett’s oesophagus. Nature. 2023;616(7958):798–805. pmid:37046089
  32. 32. Cui H, Zhou Y, Wang F, Cheng C, Zhang W, Sun R, et al. Characterization of somatic structural variations in 528 Chinese individuals with Esophageal squamous cell carcinoma. Nat Commun. 2022;13(1):6296. pmid:36272974
  33. 33. Kim H, Nguyen N-P, Turner K, Wu S, Gujar AD, Luebeck J, et al. Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers. Nat Genet. 2020;52(9):891–7. pmid:32807987
  34. 34. Cai Y, Wu Q, Liu Y, Wang J. AZIN1-AS1, A Novel Oncogenic LncRNA, Promotes the Progression of Non-Small Cell Lung Cancer by Regulating MiR-513b-5p and DUSP11. Onco Targets Ther. 2020;13:9667–78. pmid:33116570
  35. 35. Huang J, Liang X, Xuan Y, Geng C, Li Y, Lu H, et al. A reference human genome dataset of the BGISEQ-500 sequencer. Gigascience. 2017;6(5):1–9. pmid:28379488
  36. 36. Chen Y, Chen Y, Shi C, Huang Z, Zhang Y, Li S, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience. 2018;7(1):1–6. pmid:29220494
  37. 37. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. pmid:19451168
  38. 38. Institute B. Picard toolkit https://broadinstitute.github.io/picard/. Broad Institute; 2019.
  39. 39. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. pmid:20644199
  40. 40. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43(1110). pmid:25431634
  41. 41. Cibulskis K, McKenna A, Fennell T, Banks E, DePristo M, Getz G. ContEst: estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics. 2011;27(18):2601–2. pmid:21803805
  42. 42. Wang PPS, Parker WT, Branford S, Schreiber AW. BAM-matcher: a tool for rapid NGS sample matching. Bioinformatics. 2016;32(17):2699–701. pmid:27153667
  43. 43. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. pmid:20601685
  44. 44. Mayakonda A, Lin D-C, Assenov Y, Plass C, Koeffler HP. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 2018;28(11):1747–56. pmid:30341162
  45. 45. Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44(8):e71. pmid:26704973
  46. 46. Islam SMA, Díaz-Gay M, Wu Y, Barnes M, Vangara R, Bergstrom EN, et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. 2022;2(11):None. pmid:36388765
  47. 47. Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43(Database issue):D805-11. pmid:25355519
  48. 48. Favero F, Joshi T, Marquard AM, Birkbak NJ, Krzystanek M, Li Q, et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann Oncol. 2015;26(1):64–70. pmid:25319062
  49. 49. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. pmid:19505943
  50. 50. Skidmore ZL, Wagner AH, Lesurf R, Campbell KM, Kunisaki J, Griffith OL, et al. GenVisR: Genomic Visualizations in R. Bioinformatics. 2016;32(19):3012–4. pmid:27288499
  51. 51. Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32(8):1220–2. pmid:26647377
  52. 52. Larson DE, Abel HJ, Chiang C, Badve A, Das I, Eldred JM, et al. svtools: population-scale analysis of structural variation. Bioinformatics. 2019;35(22):4782–7. pmid:31218349
  53. 53. Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol. 2016;12(4):e1004873. pmid:27100738
  54. 54. Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267–73. pmid:12808457
  55. 55. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. pmid:16199517
  56. 56. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40. pmid:21546393
  57. 57. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25. pmid:26771021
  58. 58. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447-52. pmid:25352553
  59. 59. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504. pmid:14597658
  60. 60. Chin C-H, Chen S-H, Wu H-H, Ho C-W, Ko M-T, Lin C-Y. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8 Suppl 4(Suppl 4):S11. pmid:25521941
  61. 61. Kassambara A, Kosinski M, Biecek PJCCP. survminer: Drawing Survival Curves using’ggplot2’. 2016.
  62. 62. Therneau TM. A Package for Survival Analysis in R 2022. Available from: https://CRAN.R-project.org/package=survival