Cell-free tumour DNA analysis detects copy number alterations in gastro-oesophageal cancer patients

Background Analysis of cell-free tumour DNA, a liquid biopsy, is a promising biomarker for cancer. We have performed a proof-of principle study to test the applicability in the clinical setting, analysing copy number alterations (CNAs) in plasma and tumour tissue from 44 patients with gastro-oesophageal cancer. Methods DNA was isolated from blood plasma and a tissue sample from each patient. Array-CGH was applied to the tissue DNA. The cell-free plasma DNA was sequenced by low-coverage whole-genome sequencing using a clinical pipeline for non-invasive prenatal testing. WISECONDOR and ichorCNA, two bioinformatic tools, were used to process the output data and were compared to each other. Results Cancer-associated CNAs could be seen in 59% (26/44) of the tissue biopsies. In the plasma samples, a targeted approach analysing 61 regions of special interest in gastro-oesophageal cancer detected cancer-associated CNAs with a z-score >5 in 11 patients. Broadening the analysis to a whole-genome view, 17/44 patients (39%) had cancer-associated CNAs using WISECONDOR and 13 (30%) using ichorCNA. Of the 26 patients with tissue-verified cancer-associated CNAs, 14 (54%) had corresponding CNAs in plasma. Potentially clinically actionable amplifications overlapping the genes VEGFA, EGFR and FGFR2 were detected in the plasma from three patients. Conclusions We conclude that low-coverage whole-genome sequencing without prior knowledge of the tumour alterations could become a useful tool for cell-free tumour DNA analysis of total CNAs in plasma from patients with gastro-oesophageal cancer.


Introduction
The stomach and the oesophagus are the fifth and seventh most common cancer locations worldwide but cancers in these organs are the third and sixth most common causes of cancer death. Symptoms of gastro-oesophageal cancer are often diffuse and develop slowly why the diagnosis frequently is set at a late stage [1]. No reliable diagnostic clinical biomarker is available [2].
Previous classifications of gastro-oesophageal cancer have been based mainly on morphology, but now molecular classifications are gaining ground. The Cancer Genome Atlas Network has suggested four gastric cancer subtypes: Epstein-Barr virus positive tumours, microsatellite instable tumours, genomically stable tumours (with a high proportion of diffuse histological subtype), and tumours with chromosomal instability (CIN) [3]. Other classifications have also been suggested [4,5]. In the oesophagus, both squamous cell carcinoma and adenocarcinoma can occur. Oesophageal adenocarcinoma share many clinical and epidemiological characteristics with gastric adenocarcinoma [3] and there is also similarity between the genetic aberrations in squamous cell cancer and adenocarcinoma in the oesophagus [6].
Thanks to the increase in molecular tumour analyses, targeted therapy is being introduced in metastatic gastro-oesophageal cancer. For instance, ERBB2-inhibitors have been recommended for use in metastatic gastro-oesophageal cancer [7] and there are possibly many other molecular subgroups that can guide treatment [8][9][10]. Gastro-oesophageal cancer have no hotspot mutations but copy number alterations (CNAs) are common and therefore, CNA analysis could potentially identify at least half of all gastro-oesophageal cancer [11].
An emerging biomarker in the field of cancer is liquid biopsy, including analysis of cell-free tumour DNA (ctDNA) in plasma from patients with cancer. Fragmented DNA is released into the bloodstream, both from cells within the blood system and from solid tissue cells, and the fraction originating from tumour cells consequently harbours the same genetic aberrations as the tumour [12]. It has been shown that analysis of ctDNA can detect genetic aberrations from many different cancer types, including gastric cancer [13]. ctDNA reflects the total mutational burden of the tumour cells and may thus be a valuable complement to tissue biopsies, reducing the problems of heterogeneity in tissue biopsies in gastro-oesophageal cancer and being more accessible [14].
To investigate the potential of ctDNA analysis to detect and characterize gastro-oesophageal cancer CNAs, we have performed a proof of concept study in which we have compared CNAs detected by tumour tissue sample array-CGH (comparative genomic hybridization) to plasma ctDNA copy number analysis. We tested two different bioinformatic solutions for this analysis, WISECONDOR [15] and ichorCNA [16].
June 2016 to May 2018 were asked to participate. From March 2017 to September 2018 newly diagnosed oesophageal cancer patients were also asked for inclusion. This study was approved by the Regional Ethical Review Board in Stockholm (reg number 2016/2-31/1) and all participants provided written informed consent for participation in the study and for publication of the results.
In this pilot substudy, only participants with an available tumour tissue analysis and at least one plasma sample drawn in connection to the time of diagnosis were included. The time of other treatments, such as chemotherapy, radiotherapy and surgery, were noted for each participant in relation to the time of plasma and tissue sample collection.

Tissue array-CGH analysis
Tumour tissue samples were either extracted during gastroscopy or surgery and frozen within one day. Isolation of DNA was carried out using the EZ1 DNA Tissue Kit (Qiagen) according to the manufacturer's protocol. The microarray analysis was performed using a 180K oligonucleotide array with evenly distributed whole-genome coverage (AMAID 031035, Oxford Gene Technology) as previously described [17].
The array results were analysed using the Cytosure Interpret Software version 4.10.41 (Oxford Gene Technology, Begbroke, UK). CNAs were considered cancer-associated if they did not overlap with previously described CNAs in our internal database (~8000 samples) or published data sets [18][19][20] and had a log2 ratio that did not match a germline variant. We classified a tissue sample as chromosomally instable (CIN) if there were cancer-associated gains or losses on 10 or more chromosomes. There is no clear cut-off defining a general amplification in any gene and we defined the tissue array-CGH gains as amplifications when the log2 ratio indicated 5 or more copies [21].
In one selected case, a finding on array-CGH was verified using a targeted gene panel of 370 genes. This sample was prepared for sequencing using the KAPA library preparation kit (Roche Sequencing, CA, USA) and the Twist hybridization protocol (Twist Bioscience, CA, USA) and were sequenced on a NovaSeq 6000 system (Illumina, CA, USA).

Plasma ctDNA analysis
The samples were processed using the manual 16-plex VeriSeq workflow, which is routinely used for clinical non-invasive prenatal test (NIPT) samples at the Department of Clinical Genetics. Briefly, blood samples were collected in cell-free DNA blood collection tube (STRECK, La Vista, USA). The samples were centrifuged 1600g for 10 minutes at room temperature to separate the plasma from the blood cells. Plasma was transferred to microcentrifuge tubes and centrifuged at 16,000g for 10 minutes at 4˚C and the supernatant was stored at -80˚C. All plasma samples were separated within 5 days of the blood draw. Cell-free DNA was extracted with the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) from 1 ml of plasma and converted to libraries for sequencing using the TruSeq Nano DNA LT Sample Prep Kit (Illumina, San Diego, USA), with 13 cycles of amplification. Whole-genome low coverage (36 bp single-end) sequencing was performed on an Illumina HiSeq 2500 with an average of 23M reads per sample (range 14-49M).
Sequence reads were aligned to the reference genome (GRCh37/hg19) using BWA aln [22], deduplicated with Picard tools (http://broadinstitute.github.io/picard/), and converted and analyzed using the WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR)) program [15]. The software was accessed (https://github.com/VUmcCGP/wisecondor) in December 2017. 414 NIPT samples without any known foetal aberrations were used as a reference set. As a first step, the performance of a targeted approach for 61 target regions using WISECONDOR was evaluated. The target regions were selected as recurrently harbouring gastro-oesophageal cancer CNAs, according to the Cancer Genome Atlas Network [3] (S1 Table). The z-score limit was set to 3.0 and a 500 kB bin size was used for the genes and for larger chromosomal regions we combined the z-scores of all bins to a median Z-score for the region.
Thereafter, a whole-genome analysis was performed using two different software. With WISECONDOR, a sliding window method was used to identify the most significant sequence of bins (Stouffer's z-score). A bin size of 500 kb, and a minimum of 25 reference bins (all mapping on other chromosomes than the target bin) were used. CNA calls were made if they had a z-score of at least 4.95 and a minimal effect size of 1.5% (i.e. approximately a 1.5% difference in target bin sequencing coverage). We tested larger bin sizes in WISECONDOR (1,5 and 15 Mb), to reduce the number of tests and therefore allow a lower z-score threshold. However, this did not result in any additional verified cancer CNAs being detected.
Sequencing data was also analysed using the ichorCNA algorithm as previously described [16]. The software was accessed at (https://github.com/broadinstitute/ichorCNA) in February 2018. The ctDNA fraction is defined as the ratio of DNA derived from the tumour cells to the total cell free DNA. The same 414 NIPT samples as in WISECONDOR were used as a reference set ("panel of normals"). A bin size of 500kb and default settings without subclonal analysis were used in accordance with the instructions for low ctDNA fraction samples (https:// github.com/broadinstitute/ichorCNA/wiki/Parameter-tuning-and-settings). Only calls with an effect size of 1.5% (converted from the reported segmental log2 ratio) or higher were included for further analysis, in accordance with the WISECONDOR analyses.
Recurrent calls from segmental duplication regions, variable centromere regions and likely germline variants together with calls present in the reference set were filtered out from both the WISECONDOR and ichorCNA data sets. Chromosome X and Y were not included in the analysis. In addition, calls from chromosome 19, a GC-rich chromosome with known normalization problems, were filtered out [15,16]. After this, the remaining likely cancer-associated CNAs were classified as verified if they were detected by array-CGH in the paired tissue or unverified if they were not. Amplification status was determined in relation to the ctDNA fraction, when ichorCNA provided an estimate, and the ratio effect size/ctDNA fraction was used in those samples. A ratio above 1.5, indicating a copy number status of at least 5 in the tumour cells was considered an amplification. In the samples where no ctDNA fraction was calculated by ichorCNA, an effect size above 4.5%, corresponding to a copy number status of 5 if the ctDNA fraction was 3%, was considered indication of an amplification. In both the plasma and the tissue analyses, eight potentially clinically actionable gene amplifications, according to other studies [2,[23][24][25][26][27], were noted.

Reference set and positive control samples
De-identified data from a set of 414 and NIPT samples without any known foetal aneuploidies were used as reference samples for the WISECONDOR and ichorCNA analysis as described above. All the WISECONDOR and ichorCNA calls in the reference set are listed in S2 Table. In addition, sequencing data from 15 verified foetal aneuploidies were used as positive controls in the analysis set up. The foetal fraction in these samples was estimated using SeqFF [28]. All of the foetal aneuploidies could be detected by WISECONDOR and ichorCNA (S3 Table).
We also evaluated the reference set for samples with high variation that could potentially decrease the sensitivity. Comparing the bin coverage difference (mean absolute error) to the number of calls for the samples we saw no clear correlation for the WISECONDOR data (S1a Fig). However, the ichorCNA data showed a tendency towards an increased number of calls (>10) if the mean absolute error was >2.5% for the sample (S1b Fig). Therefore, the 69 reference samples with a mean absolute error >2.5% were excluded from the reference set. The use of the adjusted reference set did not affect the number of verified cancer CNAs or the ctDNA fraction estimations.

Results
81 patients were included in the study during a two-year period. Out of those 81, fresh frozen tumour tissue biopsies and plasma samples taken around the time of diagnosis were available in 44 (Fig 1). The demographic characteristics of the included patients are shown in Table 1. The median age was 70 years and 30% were women. The tumour stages ranged between 0 and IV, with the most common stage being IIIC.

CNAs detected in genomic DNA from tumour tissue
Cancer-associated CNAs (listed in S4 Table) could be seen in 59% (26/44) of the fresh frozen tumour biopsies (Fig 1 and Table 2). A total of 60 amplifications in 14 patients were detected in the tumour tissue array-CGH analyses. In patients P03 and P35, potentially clinically actionable amplifications were detected (Fig 2). In sample P03, there was an amplification of the EGFR gene and also an amplification on chromosome 17 (genome position 38-40 Mbps) adjacent to, but not including the ERBB2 gene. Analysis using a specific targeted gene panel confirmed that the amplification did not encompass ERBB2 (S6 Table). In P35 there were amplifications of the genes VEGFA and ERBB2.

CNAs detected in ctDNA from plasma
Targeted analysis of 61 regions known to harbour recurrent gains or losses in gastro-oesophageal cancer using WISECONDOR detected 18 CNAs with a z-score >5.0 in 11 patients, and 23 cancer-associated CNAs in 15 individuals if a cut-off of z-score 3.0 was used (Table 3).
In the four additional samples just passing the lower z-score threshold, there were four calls with a borderline z-score between 3.0 and 3.7 and none of these could be verified in the tissue analysis, suggesting that they were most likely false positives calls. In one, P03, a gain in the MYC gene with a z-score of 4.7 was found. This gain was confirmed in the tissue analysis and was interpreted to be a true cancer CNA. Eight of the patients had a CIN-profile in the corresponding tumour sample. In all, the detected CNAs included five amplifications of the MYC  gene. In the tissue array-CGH analysis, three of the MYC gene amplifications were visible as amplifications and one as a gain. The fifth patient (P13) had tissue sampled after neoadjuvant chemotherapy and no CNAs were detected by array-CGH.
Initially in the whole-genome analysis, the output from WISECONDOR and ichorCNA comprised 449 and 234 CNAs with a minimum effect size of 1.5% and excluding the CNAs from chromosome 19, X and Y. In WISECONDOR, 166 CNAs in 17 patients (39% of the patients) were classified as cancer-associated (S5 Table). 133 CNAs in 14 patients were verified in the tissue array-CGH. In ichorCNA, 156 CNAs in 13 patients (30% of the patients) were classified as cancer-associated (S5 Table). 125 CNAs in 10 patients were verified in the tissue array-CGH. Among the 26 individuals with tumour tissue CNAs, the WISECONDOR plasma cancer detection rate in the whole-genome analysis of our cohort was 54% (14/26). S2 Fig shows the results in P04 from both whole-genome analyses software programs and the tissue array-CGH, as an example.
The whole-genome analysis detected plasma aberrations in all 11 patients that were previously detected by the target analysis using a Z-score of 5.0 as a threshold. In addition, six other Table 3. Targeted analysis in plasma. Targeted analysis in WISECONDOR data of 61 selected genomic regions and genes (reported in S2 Table). All regions with a Zscore 3.0 or higher and their corresponding findings in the whole-genome targeted plasma analyses and tissue array-CGH are presented. The "Ratio" column contains the normalized coverage difference in the targeted analysis, corresponding to effect size in the whole-genome analysis.

Individual
Target patients had CNAs detected by WISECONDOR using whole-genome analysis. ichorCNA detected 10 of the 11 found using the targeted analysis as well as an additional three patients in the whole-genome assay. All five of the MYC gene hits on chromosome 8q in the targeted analysis (patients P03, P05, P08, P09 and P13) were detected in the whole-genome analysis in plasma using WISECONDOR. However, one of them, in P03, was classified as a gain and not an amplification. In fact, chromosome 8 showed the largest number of CNAs: 21 in WISE-CONDOR (whereof 19 gains), and 15 in ichorCNA (whereof 13 gains) occurring in 10 patients. The highest MYC gene amplification was seen in P08, with a ctDNA fraction of 3.5% and an effect size of 88% corresponding to an estimated copy number of approximately 50. There were in total 48 and 14 amplifications detected in WISECONDOR and ichorCNA, respectively. Potentially clinically actionable amplifications were detected in patients P01, P03 and P44 in the genes VEGFA, EGFR, and FGFR2.
In the total cohort of 44 patients, 15 CIN cases were detected. 14 of these were defined by their profile in the tissue array-CGH but patient P44 had cancer-associated CNAs present on more than 10 chromosomes only visible in plasma and not in the tissue array-CGH, which was sampled after neoadjuvant therapy. Out of the 15 patients with a CIN profile in either the plasma or tissue analysis, 11 (73%) had at least one cancer-associated CNA in plasma. In the non-CIN group, the corresponding number was 6/29 (21%).
Three patients had tumour-associated CNAs in plasma, which were not found in the analysis of tumour tissue. The tumour tissue sampling was in all these three patients performed after neoadjuvant chemotherapy. A clear MYC amplification was seen in individual P13 using both software, large WISECONDOR and ichorCNA CNAs with high effect sizes on chromosome 3, 5, 7 and 8 were seen in individual P07 and both software detected gains, amplifications and losses on several chromosomes in individual P44.
In the twelve samples where ichorCNA could estimate a ctDNA fraction, it was reported as 2.5-8% (Table 2). In two of the samples with an estimated ctDNA fraction 0%, ichorCNA did, however, find a tissue-verified MYC gene amplification (P09 and P13). There was no clear correlation between either ctDNA fraction or effect size and tumour size, tumour type, stage or distant metastasis in our small cohort. There was, though, a tendency but no significant correlation, for larger tumour sizes in the group with cancer-associated CNAs in plasma (S3 Fig).
In an effort to elucidate what might be a relevant threshold for cancer-associated CNAs even if tissue samples were not available, all CNAs from patients with tissue samples taken before neoadjuvant chemotherapy were analysed. The length of the aberration was plotted against the effect size (S4 Fig). In WISECONDOR, all CNAs >30Mb long and with an effect size >3% were verified cancer-associated CNAs. The same applies for all CNAs between 5 and 30Mb with an effect size >5%. For ichorCNA, a threshold of 30Mb in size and an effect size >3% or a size between 2 and 30Mb and an effect size of 5% identified verified cancerassociated CNAs with no unverified CNAs. In this group of patients, there were three CNAs in total that were classified as cancer-associated even though they were not initially verified in the tissue array-CGH. Two of them, both gains on chromosome 20 in P04, were detected in WISECONDOR and ichorCNA and upon re-evaluation of the array-CGH, a small deviation of the log2 ratio, suggesting a subclonal gain of a large part of chromosome 20, was detected but considered too low in effect size in comparison to the rest of the tissue analysis to confidently mark as cancer-associated. The third non-verified but cancer-associated plasma call was a loss detected on chromosome 7 in P14. The array-CGH showed a variable level of gains and losses in this region, including one gain spanning the first half of the plasma CNA and a loss in the second part, and therefore it could not be classified as a clear loss in the tissue analysis.

Discussion
In this pilot study of 44 individuals with gastric and oesophageal cancer, we have shown that low-coverage whole genome sequencing of cell free DNA can detect CNAs in 39% (17/44) of all patients. Of the patients with known CNAs in tumour tissue 54% (14/26) had cancer-associated CNAs in plasma.
An advantage with this study is the use of a clinically validated workflow for non-invasive prenatal tests. Owing to this approach, the potential step from research to implementation in clinical routine is short. The use of a large reference set, handled in the same way as the cancer patient samples, and analysis of paired tumour tissue and plasma samples, enabled verification to reduce false positive CNAs in plasma from our cancer cohort. Since we used two different bioinformatic software programs in the same setting, we could compare their performance and evaluate their applicability to our clinical pipeline.
We started by using a targeted approach, including 61 regions in the plasma analyses. By using a z-score cut off of 3.0 we would expect 2-3 false positives (~1/1000 x 61 x 44) and we suspect that four of the 15 patients that had detectable CNAs using this approach were in fact false positives, since their CNAs were not verified in tissue or whole-genome analysis and three of them did not even have any cancer-associated CNAs detectable at all in any of the subsequent analyses of plasma and tissue. Using a more stringent z-score cut-off of 5.0 enabled detection of CNAs in 11 patients. By using a z-score limit of 4.0 we could identify one additional CNA in P03 (who already had another hit in the targeted analysis): a MYC gene amplification verified in both whole-genome plasma and tissue analysis.
Out of the 61 regions, only 23 contained CNAs with a z-score of 3.0 or higher, possibly suggesting that our region selection was not optimal. We based the panel of target regions primarily on the results from the Cancer Genome Atlas Research Network, including 295 gastric cancer samples [3]. 60% of these regions were replicated in the whole genome sequencing study on 168 treatment naïve gastric cancer samples in a Chinese study, suggesting that they are biologically relevant [21] and 75% of the peak regions of focal amplification presented by Schumacher et al [11] are included in our targeted region list. A subset are known recurrent amplifications of e.g. the MYC, ERBB2 and EGFR genes [29]. Tumours are often aneuploid with multiple subclonal events and our analysis methods (array-CGH and whole-genome sequencing) can only measure the total ratio or coverage over each region (i.e. the sum of all the gains and losses) in relation to the mean in controls. In addition, gastro-oesophageal cancer is highly heterogeneous regarding CNAs with most recurrent gains and losses present in a minority of cases [11]. This complicates the interpretation of some regions.
To increase the detection rate, we expanded the plasma analysis in a whole-genome approach. This does not require prior knowledge of specific cancer aberrations. The wholegenome approach increased the number of individuals with cancer-associated CNAs detected in plasma to 17, as compared to 11 in the targeted analysis. In late stage cancers, whole-genome CNA analysis using low-coverage MPS on ctDNA may be a cost-effective measurement of disease burden and it is applicable across different cancer types when the ctDNA fractions are high enough [30,31]. Since gastro-oesophageal cancer is a genetically complex disease but gains and losses are seen in more than half of the patients [11,23,32], our approach is suitable for ctDNA analysis even when a tumour tissue sample is not available.
Recurrent arm-level chromosome gains have previously been reported in more than 50% of gastric cancers, on chromosome 20q, 20p and 8q, and losses on chromosome 18q and 21q are seen in about 40% of the cases [11]. In the gastric cancer patients in our study, gains in tissue and/or plasma on chromosome 20q, 20p and/or 8q could be seen in 18 patients, corresponding to 78% (12 out of the 23 patients with gastric cancer and any detected CNA in plasma or tissue). Losses on chromosome 18q and/or 21q could be seen in 10 patients (43%). In oesophageal adenocarcinomas, gains frequently occur on chromosomes 6q, 7p, 7q, 8q, 11q, 15q, and 17q and losses are common on chromosomes 1p, 3p, 4q, 8p, 9p, 17p, and 18q [33]. All three patients with oesophageal cancer in our study had CNAs overlapping these regions. Oesophageal squamous cell carcinomas have recurrent gains on chromosomes 7, 8 and 11 and losses on chromosomes 2, 3 and 7, for instance [32]. The two patients in our study with squamous cell carcinoma had gains overlapping those regions.
Clonal haematopoiesis is a common source of CNAs in plasma [34]. Clonal mosaic copy number gains in blood cells in healthy controls are recurrent at chromosome 8, 12 and 15 [35,36] and can be misinterpreted as cancer-related. Of note,~80% of the patients as well as the cancer-associated plasma CNAs detected in our study were also verified in the tissue biopsies. Therefore, we are confident that we have avoided most false positives due to clonal haematopoiesis in the plasma samples. However, if plasma-analysis is to be done without tumour sample analysis, it is recommended to compare the results from the ctDNA to those of DNA from peripheral blood in cases where that is needed in order to identify CNAs originating from leukocyte clonal haematopoiesis.
WISECONDOR provided a higher CNA detection rate and more amplifications compared to ichorCNA. This is in accordance with a previous report, arguing that the normalization process is a key step in the bioinformatic analysis [37]. WISECONDOR has an optimized normalization procedure that is based on a principal component analysis for the selection of a set of reference bins, usually~100 bins across many chromosomes for each target bin. In the version of WISECONDOR applied in this study, gains are not always correctly segmented and smaller amplifications may therefore be masked by larger overlapping or nearby gains. For example, in the tissue array-CGH and the whole-genome analysis with ichorCNA from P03, an amplification of the MYC gene was detected. In WISECONDOR, only a gain was detected in the region. In a further development of the original version, (WisecondorX), segmentation has been altered to address this potential problem [37]. Also, in contrast to WISECONDOR, ichorCNA does provide a calculation of the ctDNA fraction, which can be an advantage, for instance in the interpretation of gains versus amplifications.
Amplification in a few genes are considered potentially actionable in gastric cancer [2,[23][24][25][26][27]. In total, four patients had such amplifications and all four of them had a CIN tumour profile. In both P01 with plasma and tissue sampled before chemotherapy, amplifications (in VEGFA and CD44) detected in plasma were called as gains in the tissue by array-CGH. One possible explanation for this discrepancy is tumour heterogeneity. Another example of potential tumour heterogeneity can be seen in P03 (Fig 2), who also had an amplification of EGFR detected in both plasma and tissue. P35 had amplifications of the VEGFA and ERBB2 genes in tissue only, but the plasma sample was taken after chemotherapy. P44 had an amplification in the FGFR2 gene in plasma only and the post-treatment tissue biopsy was negative despite a clinical stage IV, showing the difficulties in capturing relevant cells in small tissue biopsies. ERBB2 amplification analysis, mostly by immunohistochemistry for ERBB2 expression, is currently being introduced as a standard clinical test in gastro-oesophageal cancer patients in the clinic, in particular in a metastatic situation, but was not yet standard procedure at the time of the tissue sample collection for this study. Therefore, only two patients (P21 and P35) had a clinically detected ERBB2 overexpression. In P21, no ERBB2 amplification could be seen in plasma or the tissue array-CGH (which did contain other detectable CNAs), but in P35, the ERBB2 amplification was detected in tissue (Fig 2).
According to the ichor-CNA developers, their algorithm can reliably detect a ctDNA fraction of at least 3%. In addition, at least one amplification and one deletion event, both larger than 100Mb are needed in order to provide an accurate estimation of ctDNA fraction [16,38].
In all, six samples with detectable CNAs in plasma had no estimation of ctDNA fraction. Five harboured only 1 or 2 small CNAs making an estimation of ctDNA fraction by ichor-CNA impossible (P09, P13, P14, P16 and P17). Only one sample (P05) had several larger CNAs called by WISECONDOR, but these were not detected by ichorCNA, thus the lack of a ctDNA fraction estimation is likely due to a low ctDNA fraction.
The sensitivity of low coverage whole genome sequencing for CNAs in cell free DNA is dependent on both the technical sensitivity of the method and the biological variation in the samples. The biological sensitivity depends on whether or not the tumour had CNAs at all, as well as on its propensity to shed cell free DNA into the circulation and thus yield a higher ctDNA fraction in plasma. In addition, subclonal events will be more difficult to detect than early CNAs that are present in all or a large majority of tumour cells. In many cases, the ctDNA fraction will be the determining factor for the technical sensitivity. For prenatal screening, a minimum fraction of foetal DNA in the total cell free DNA of 2.0% is required in order to detect trisomy 13, 18 or 21 [39] but there are differences between different platforms and some require at least 4%. Of note, most studies that use low-coverage whole genome sequencing require a ctDNA fraction of at least 5-10% [16,40]. Among our positive prenatal samples, all foetal trisomies were detected by both WISECONDOR and ichorCNA and the lowest foetal fraction was 5% (S3 Table). Among the cfDNA samples from cancer patients, WISECONDOR and ichor-CNA could detect CNAs in samples with a ctDNA fraction of 0-8.4%, and 7/11 (64%) had a ctDNA fraction of less than 5% (Fig 1). This is in line with other studies; the ctDNA fraction in advanced gastric cancer was reported to be 0.3-8% with a median of approximately 1.6% [41] and in early cancer stages of other gastrointestinal cancers the fractions are even lower (0-0.8%) [42].
Our analysis of tumour biopsies taken before neoadjuvant chemotherapy showed that 23/ 30 (77%) tumours harboured CNAs, thus we would expect at most 23 samples to have positive ctDNA results. Of these, 13 had positive ctDNA results and ten were negative. Four of the negative samples (P23, P28, P29, P31) had CNAs less than 50Mb in size with a small effect size on array-CGH analysis, which were likely below the technical detection limit of the algorithms. Six samples had large CNAs with at least one trisomy (P15, P20, P22, P26, P38, P42) (Fig 2). The CNAs found in tissue in P15 had a low effect size (S4 Table) and were likely subclonal events that were under the threshold of detection in plasma. The other five samples had a ctDNA fraction of 0 (i.e. were tumours that did not shed much ctDNA) and thus their ctDNA was negative. It is not yet established which factors are the most important for determining the level of ctDNA fraction, although localization, size, stage, kidney clearance, age, invasiveness have all been suggested [43]. Our cohort is too small to robustly analyse these parameters, but we did see a tendency towards a correlation between larger tumour size and higher ctDNA levels. We used 1 ml of plasma from each individual, according to the standard NIPT protocol. Increasing the sample input volume to 3 ml plasma does not increase the detection of samples with low ctDNA fraction [44], but will only increase the cost [16].
To date, there are very few studies on CNAs in ctDNA in gastro-oesophageal cancer cohorts, making this study an important contribution. Most reports on plasma CNA detection are small proof-of-concept studies [45] and the majority of the studies investigating ctDNA in gastro-oesophageal cancer include predominantly individuals of Asian ancestry [31]. WISECONDOR or ichorCNA have been used for plasma analyses of CNAs in ctDNA in cancer. Cohen et al report on the application of WISECONDOR in a cohort of 32 women with ovarian cancer and 32 benign controls [46]. Adalsteinsson et al used ichorCNA in a cohort of 520 patients with metastatic prostate and breast cancer, comparing to tissue analyses in 41 patients [16].
One study by Davidson et al used ichorCNA on a cohort of 30 individuals with gastro-oesophageal adenocarcinoma with no reference set [44]. CNAs were detected in 23 (77%) of the individuals using a whole-genome approach and ichorCNA bioinformatics, with recurrent gains on for instance 8p. The approximate same region on 8p also showed gains in 6 of the patients in our cohort. In a targeted approach with 50kb bins, they found tissue-verified amplifications. Their study cohort included only advanced inoperable (10%) or metastatic (90%) cancers, while only 16% of our cohort comprised patients with verified metastases. Also, 80% of the patients included in the study by Davidson et al were cancers in the oesophagus, while only 5 (11%) in our cohort were, which might explain the differences in the detection rate. In addition, all CNAs in our cohort were filtered against a reference set with exclusion of 63% of all calls from WISECONDOR and 34% of all CNA calls from ichor. Therefore, our data are not immediately comparable to the data presented by Davidson et al. Another study, using whole-genome low coverage analysis with a focus on chromosomal instability scores in gastrooesophageal cancer, identified 27/55 (49%) of patients with CNAs in plasma after comparing to DNA from peripheral blood cells [47] and Maron et al analysed a targeted panel on ctDNA including amplifications with a detection of multiple amplifications in 40% of the gastro-oesophageal cancer patients [29], both in line with our results.
Different approaches of plasma DNA examination in patients with gastro-oesophageal cancer have been reported, initially mostly analyses of total cell free DNA concentration, which is however, a non-specific test for cancer [12]. Another approach for ctDNA analysis in gastrooesophageal cancer is single nucleotide variant analysis. Most of the cohorts are small and diverse when it comes to tumour subtype and stage, with a detection rate spanning 20-80%. The technical approach also differs between studies, using either a personalised panel adapted to known tumour tissue genetic aberrations from the same individual [48][49][50][51][52] or a pre-set panel [29,[53][54][55][56][57][58][59].
It is well known that tumour tissue biopsies have limitations. Naturally, sometimes the tumour location makes a tissue biopsy procedure difficult. In addition, due to tumour heterogeneity, a single tumour biopsy may only represent a small clone that is not representative of the major tumour burden. In fact, before using ERBB2-inhibitor therapy in gastro-oesophageal cancer, at least five biopsies are recommended in order to ensure reliable results [7]. Thus, ctDNA analysis might provide more comprehensive, or additional CNA information [60]. An example is the amplification of the VEGFA gene in P01, which is only visible as a gain in the tissue-array-CGH but a clear amplification in the plasma analyses.
All newly diagnosed patients with gastroesophageal cancer that are potentially operable are referred to the Department of Upper Abdominal Diseases at Karolinska University Hospital. Most of the patients who were eligible for participation in the study accepted. The gender (30% women) and age (median 70 years) of the patients included in this study are comparable to the 40% women and median age 72 years reported for gastric cancer in Sweden 2018 [61]. The most common tumour stage in Stockholm 2016-2018 was III [62], and that was also the most common stage of the patients included in this study. The study cohort is thus representative of the population with gastroesophageal cancer in Stockholm county.
Our study reflects the clinical situation, where patients often perform their diagnostic biopsy in another medical centre before being referred to the university hospital for treatment and fresh frozen tumour tissue is not always available before initiation of treatment. Drawing a blood sample with analysis of plasma DNA with rigorous filtering might in these cases provide important information without the need for a second gastroscopy. Of note, two of the participants had squamous cell cancer in the oesophagus. It is known that the genetics of squamous cell tumours and adenocarcinomas differ and more studies on ctDNA in both these groups are needed to be able to know if the same approach can be used in liquid biopsy for both groups.
The clinical stage of the cancer in our study was estimated in a multidisciplinary tumour board consisting of experienced gastro-oesophageal cancer surgeons, oncologists, radiologists, pathologists and endoscopists and was based on gastroscopy, biopsy and CT/PET-CT (computer tomography, positron-emission tomography) scans. The complete histopathological report on the resected tumour specimen, received after surgery, can be more accurate, but has the disadvantage of often being made after neoadjuvant chemo-or chemoradiation treatment and in many patients no surgery is performed. Therefore, although there was no correlation between the clinical stage of the patients and the ctDNA fraction estimate or the effect size in our study, such a correlation cannot be ruled out and should be addressed in a larger cohort.

Conclusions
In summary, low-coverage whole-genome sequencing without prior knowledge of the tumour aberrations is a useful tool for ctDNA analysis of total copy number alterations in plasma from patients with gastro-oesophageal cancer. It can detect chromosomal instability as well as clinically actionable amplifications in genes important for therapy such as ERBB2 and EGFR and is thus an important complement to more traditional gene panel analyses that target single nucleotide variants. In addition, liquid biopsies are minimally invasive and provide overall information on the genetic aberrations regardless of tumour heterogeneity. Further studies are needed on longitudinal liquid biopsy samples from more gastro-oesophageal cancer patients in order to follow tumour dynamics and further investigate the sensitivity of the method.

S1 Fig. Sample bin coverage variation in relation to number of calls.
Mean absolute error of the normalized coverage difference between all bins and the number of calls for all of the samples in the reference set (n = 414). The mean absolute error is plotted on the X cropped axis and the number of copy number alterations called by WISECONDOR (S1a) and ichorCNA (S1b) are plotted on the Y axis. In the tissue array-CGH (S2a) copy number alterations (CNAs) after filtering are indicated by blue boxes with gains above the zero line and losses below. In the upper panel are the general overview chromosomal positions and, on the Y-axis, the log2 ratio of each probe is shown as dots. The moving average is indicated by a blue line. In the low-coverage whole-genome analysis in plasma using WISECONDOR (S2b), the blue line indicates the bin Z-score. Called regions (before filtering) are indicated by yellow/green boxes depending on the effect size together with the Z-score for the region. In the low-coverage whole-genome analysis in plasma using ichorCNA (S2c) dots represent bins with their log2 ratio shown on the Y-axis. Regions with gains, including amplifications, are indicated by brown/red colour and losses are indicated by green. Plot of all cancer-associated CNAs in plasma from individuals (n = 13) with tissue samples taken before any chemotherapy and CNAs called by WISECONDOR (S4a) and ichor (S4b). On the log10 Yaxis, effect size of the CNA and on the X-axis, size in megabasepairs (Mbps) of the CNA. CNAs verified in the tissue array-CGH are coloured orange and the "unverified" are coloured blue. The CNAs coloured grey (n = 3 in total) were manually classified as cancer-associated even though they were not visible in the tissue array-CGH (see Methods for details). (TIF) S1 Table. Regions and genes in the targeted plasma analysis. All