Multisite verification of the accuracy of a multi-gene next generation sequencing panel for detection of mutations and copy number alterations in solid tumours

Molecular variants including single nucleotide variants (SNVs), copy number variants (CNVs) and fusions can be detected in the clinical setting using deep targeted sequencing. These assays support low limits of detection using little genomic input material. They are gaining in popularity in clinical laboratories, where sample volumes are limited, and low variant allele fractions may be present. However, data on reproducibility between laboratories is limited. Using a ring study, we evaluated the performance of 7 Ontario laboratories using targeted sequencing panels. All laboratories analysed a series of control and clinical samples for SNVs/CNVs and gene fusions. High concordance was observed across laboratories for measured CNVs and SNVs. Over 97% of SNV calls in clinical samples were detected by all laboratories. Whilst only a single CNV was detected in the clinical samples tested, all laboratories were able to reproducibly report both the variant and copy number. Concordance for information derived from RNA was lower than observed for DNA, due largely to decreased quality metrics associated with the RNA components of the assay, suggesting that the RNA portions of comprehensive NGS assays may be more vulnerable to variations in approach and workflow. Overall the results of this study support the use of the OFA for targeted sequencing for testing of clinical samples and suggest specific internal quality metrics that can be reliable indicators of assay failure. While we believe this evidence can be interpreted to support deep targeted sequencing in general, additional studies should be performed to confirm this.


Introduction
The value of specific gene alterations to match individuals with cancer to molecularly targeted agents is now clear in many tumour types [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. While current requirements for detection of somatic variants remain limited in specific jurisdictions such as Ontario, this is likely to change significantly as increased numbers of targeted therapies for both solid and hematologic tumours are approved by the FDA and Health Canada. Thus, it is expected that the need for broader mutation profiling for tumours will expand rapidly over the next several years.
Increasingly, broad genomic profiling, using whole genome or exome sequencing and whole transcriptome sequencing, is used in clinical trials of metastatic cancer to attempt to match patients to targeted therapeutics based on genomic features of their tumours [19][20][21][22][23]. Many of these trials highlight the complexity of delivering broad genomic analyses from small diagnostic samples in a timely fashion. To address this challenge a number of small and large targeted sequencing panels have been developed which cover key molecular alterations across multiple cancer types (e.g. MSK-IMPACT, Foundation One, TruSight series, Oncomine series). Among these, the Oncomine Focus (OFA) and Oncomine Comprehensive v3 (OCAv3) assays have been adopted in a number of clinical diagnostic laboratories globally. Notably, the OCAv3 is the assay of choice in the on-going Adult and Pediatric NCI-Molecular Analysis for Therapy Choice (MATCH) programs (NCT02465060/ NCT03155620) [24,25], to allocate metastatic patients to novel targeted therapeutics. In Canada, OCAv3 is also being used by the CAPTUR, OCTANE [26,27] and the Exactis Innovation [28] programs, to stratify patients with advanced cancer for treatment based on molecular alternations. The OFA is a smaller and more economical version of the OCAv3, and, like OCAv3, has the advantage of being appropriate for small amounts of DNA/RNA extracted from formalin fixed paraffin embedded (FFPE) diagnostic samples. Both OCAv3 and OFA can be readily adopted in routine clinical settings combining deep coverage of key genes using next generation sequencing methodology and a curated informatics reporting pipeline. Critical, however, to the implementation and subsequent clinical validation of any assay in the routine clinical setting is the ability to rely on accurate and reproducible results. In order to assess the accuracy and reproducibility of the OFA for detection of mutations, copy number alterations and fusions, we performed a ring study that included two research and five clinical molecular diagnostic laboratories in Ontario and compared results using a small series of solid tumour and control specimens.

Study design and samples
The study was designed to assess the performance of the OFA across research and clinical laboratories in Ontario. Ethics approval was obtained from the Queen's University Research Ethics Board (Study PATH-161-16). The study was divided into two phases. In phase 1, participating centers were provided with extracted nucleic acids from 3 control DNA (Horizon Quantitative Multiplex Reference Standard, Horizon KRAS Gene-Specific Multiplex Reference Standard and Horizon EGFR Gene-Specific Multiplex Reference Standard) and 1 control RNA (Horizon HD784) cell lines, as well as 9 DNA samples and 11 RNA samples extracted from anonymized solid tumours (Table 1). Nucleic acids for use in phase 1 were extracted at the Kingston Health Sciences Center. For phase 2, 6-μm sections from 8 of the tumours used in phase 1 were placed onto unbaked glass slides. A parallel H&E slide was examined by a pathologist, and marked to identify the tumour area. Images of the marked slides for each tumour were made available to each centre. Two slides were sent to each participating center. Each center received non-consecutive slides (i.e. slide 1 and slide 8, slide 2 and slide 9, etc) to minimize the probability of tumour heterogeneity being a major factor confounding the results. For phase 2, each center performed RNA and DNA extraction individually and used the extracted material as input to the OFA.

Participating centers
The following centers participated in the study: Kingston Health Sciences Center (Feilotter), Sunnybrook Health Sciences Centre (Seth), Ontario Institute for Cancer Research (Bartlett), Princess Margaret Hospital/UHN (Kamel Reid), Ottawa Hospital (Lo), London Health Sciences Centre (Sadikovic), Health Science North (McClure). Each center was provided a code letter (A-G) to anonymize results. All centers except Laboratory F used the OFA assay; Laboratory F used the OCAv3 assay. The centers running the OFA assay used either a Ion Torrent PGM sequencer and a 318 chip, or the Ion Torrent S5XL sequencer and a 520 chip. Laboratory F used an Ion Torrent S5XL sequencer and a 540 chip (Thermo Fisher Scientific, Waltham, MA, USA).

Data generation and analysis
Sequencing runs were originally analyzed using Torrent Suite Software (TSS) versions 5.0-5.4 (depending on the timing of when runs were completed) to generate BAM files, which were uploaded to the Ion Reporter (IR) server. After completion of the study, all runs were re-analyzed using default IR Oncomine workflows in IR v5.10. DNA quality metrics were analyzed using a TSS coverage analysis plugin, and RNA quality metrics were investigated using attributes from the BAM file as well as the IR fusion workflow metrics. Minimum standards were specified according to standards provided with the OFA and OCAv3 assays. For OFA, at the time of the study, a minimum of 300,000 mapped reads for DNA and 5,000 mapped reads for RNA was required to pass quality control. More recently, the minimum number of mapped reads for RNA was changed to 50,000, and both of these metrics were used in analysis. For OCAv3, a minimum of 3,000,000 reads for DNA and 500,000 reads for RNA was required. At the time of the study, the minimum fragment length for OFA for DNA was 100, and for RNA 75 bases. More recently, this has been changed to 75 for DNA and 60 for RNA, and both metrics were considered in analysis. For OCAv3, the minimum length were set at 80 and 60 bases for DNA and RNA respectively. The minimum percent uniformity was set at 80% for both OFA and OCAv3. Variant summary files were created as VCF files. Three different types of variants (SNV/indel, copy number variant and fusion genes) were assessed using default calling settings.

Quality metrics
Quality metrics, including read length, mapped reads (DNA & RNA) and uniformity of base coverage (DNA), are reported in Tables 2-6 for phase 1 and Tables 7-11 for phase 2. Samples not meeting the minimum standards set out in the Methods are flagged.

Phase 1
One sample (the DNA standard, Laboratory E) failed for technical reasons (not sequencing related) and results for this sample were not reported. No sample failed either the original or the updated read length metric (Table 2), Two samples (Samples 2 and 5) failed both the minimum mapped read requirement (Table 3) and the uniformity percentage for DNA (Table 4) in Laboratory G. Twelve samples (including 2 reference samples) failed the original RNA read length metric (Table 5), including 6 from Laboratory B, although this was reduced to 5 failed samples using the newer minimum read length requirement of 60 bases. Three samples failed the OCAv3 mapped read value (Table 6). For OFA, the RNA reference failed the original mapped read requirement in Laboratory E. Using the updated metric of minimum of 50,000 mapped RNA reads, substantially higher numbers of failures were seen, in particular from laboratories B, and D.

Phase 2
In phase 2, no samples failed for the DNA read length values (Table 7). One sample failed for number of DNA mapped reads for OFA, and 3 samples failed OCAv3 for this metric ( Table 8).
The same samples, plus one additional OCAv3 sample failed the uniformity metric (Table 9). For RNA, laboratories B, D and F saw most samples fail using the original read length metric, with fewer failures (mostly in laboratory B) using the new metric (Table 10). However, using the newer metric for required minimum number of RNA mapped reads, laboratories B and D saw multiple failures across samples, while LaboratoryF was unsuccessful with the RNA runs for all samples using OCAv3. Data from flagged samples was not considered in the following analyses unless otherwise specified.   Table 13. Five of the laboratories detected the majority of fusions, although laboratories E and F did not detect any fusions.

Variant calls from clinical specimens
Hotspot SNV/indel calls. The 8 solid tumour samples were assessed for the presence of variant calls. Each sample had previously been assayed using a clinically validated Ion Ampli-Seq Cancer HotSpot Panel v2 assay (Thermo Fisher Scientific, Waltham, MA, USA). Results from the clinically validated assay were accepted as the "true" results for each sample, and results from the ring study were measured against these expected calls.
In phases 1 and 2, a total of 11 different gain of function hotspot variants were expected (Table 14) across 11 samples run in the 7 sequencing laboratories. For this analysis, data from all runs were included to allow investigation of which quality metrics might be critical for variant calling. Overall 143/147 potential variants were successfully called (97.3%). VAFs for all of the positively called variants were highly concordant between all laboratories for both phases (Table 14). Three calls were missed by LaboratoryG due to two phase 1 samples that failed to  Table 15 shows the MAPD values for each sample across each laboratory for phases 1 and 2. Generally, copy number can be assessed if the MAPD value remains below 0.5. In phase 1, Samples 1 and 9 had values >0.5 from Laboratory F. In phase 2, Sample 4 had an MAPD value greater than 0.5 for Laboratory B, and all samples from Laboratory F with the exception of sample 7 had MAPD values > 0.5. From the clinical samples, copy number gain of the MYC locus was consistently identified in Sample 4, with copy gain estimates ranging from 6.76 to 11.6 across both phases (Table 16).
RNA fusions. A MET exon 14 skipping call was made in 3 samples in 2 labs (Samples 2, 6 and 9 in both phase 1 and 2 in Lab A and Sample 9 in both phase 1 and 2 in Lab G. All fusion calls were below 1% of total mapped RNA reads. No materials remain for orthogonal validation of the calls.

Discussion
Targeted next generation sequencing assays, also referred to as massively parallel sequencing assays, to identify variants in tumours have become standard practice in many clinical laboratories. In Ontario, laboratories currently offer a variety of such panels, including the commonly used Ion AmpliSeq Cancer HotSpot Panel v2 assay (Thermo Fisher Scientific) as well as the TruSight panels (Illumina). In the early days of next generation sequencing, most laboratories relied on panels that detected single nucleotide variants and small insertions or deletions. However, currently more laboratories are assessing panels designed to interrogate high-level copy number changes, as well as the expression of fusion events in an effort to conserve precious sample volumes and curb the costs and the time associated with sequential multiple testing.
We engaged in a ring study including five clinical and two research laboratories across Ontario to assess the parameters of one such assay, the Oncomine Focus Assay (Thermo Fisher Scientific). The rationale was to determine the potential strengths and weaknesses of the assay and to provide important data that could subsequently assist laboratories to validate the assay in house. Ultimately, one of the clinical laboratories involved opted to utilize a larger panel (OCAv3), which limited some of the comparisons that could be made. It was clear that the assays in all 7 laboratories were performing at a level sufficient to reliably detect all DNA variants from control cell lines at 5% VAF or greater. Although we did not do a formal study of limit of detection, we can determine that with the mean depths achieved in these studies (>1200 reads per amplicon), both assays can reliably detect SNVs and indels present in at least 5% of the molecules interrogated. Likewise, using control materials to investigate the ability of the assays to identify fusions by way of input total RNA, laboratories generally were able to detect the fusions. Laboratories unable to detect fusions in control materials were either using the OCAv3 assay and likely were limited by the amount of RNA provided, or showed failed quality metrics for both mean read length and total mapped RNA reads for the control sample. The relationship of the low quality metrics with the lack of fusion detection confirms that the quality metrics are critical components of the assay that must be tracked and used to guide interpretation. Among the laboratories that successfully detected the fusions, we noted variability in the relative number of fusion molecules detected between laboratories. This variability persisted even when visualizing the fusion results as proportion of total mapped reads. The reason for this is not clear, but does suggest that the RNA metrics for these assays require careful scrutiny, and that clinical laboratories should ensure that they independently determine minimum standards for fusion calling, which could well be somewhat different for different fusion molecules of interest.
Using clinical formalin fixed paraffin embedded specimens, clinically relevant hotspot calls were consistent (97% of clinically relevant calls were made using orthogonal data as a standard) across all assays and on all samples tested, with highly similar VAFs, with 3 exceptions. The missed calls highlighted the importance of the quality metrics associated with each sample, as all calls that were missed were from samples where the quality indicators of mapped read numbers and uniformity would have flagged the sample as substandard quality. Clearly, the assay quality metrics highlight samples where clinically relevant calls might be missed, and should be appropriately tracked and used as indicators to repeat assays, where possible. Barring that, the presence of out of range quality metrics, in particular mapped reads and uniformity, should flag a sample to be reported as inconclusive rather than negative. However, given the extremely high accuracy of detecting actionable mutations in repeat analyses across 7 laboratories using either previously extracted or locally extracted DNA, we conclude that this assay satisfies important criteria relating to accuracy and reproducibility across multiple testing laboratories for mutation detection.
Identification of copy number variants is becoming an increasingly useful technique, as large-scale or gene-level genomic amplifications or deletions are being associated with drug response or prognosis. In this study, only a single sample was shown to have an amplification of the MYC locus, and the calls from all of the laboratories were consistent and reproducible. Perhaps more strikingly, the numerical copy number estimates for all of the known gain of function CNV areas represented on the OFA assay were highly concordant between the participating centres. Again, given the high consistency, and reproducibility of both calls (gain/loss/ no change) and copy number estimates, we conclude this to be a highly reproducible CNV assessment platform. We are limited in our ability to comment broadly on accuracy for CNV detection because a) the gains observed were not orthogonally validated (being non-actionable at this time) and b) there being only 1 gain in the samples assessed. The RNA findings from the clinical specimens were less robust. The smaller number of assays performed and the wider quality metrics make interpretation of this part of the assay more challenging. However, although we did not have available material to confirm the presence of MET exon 14 skipping events in the 4 samples where that event was detected, we were able to determine that the samples in which the skipping event was called were unlikely to carry such a biomarker. Of the 4 clinical specimens with a suggested MET exon14 skipping event, two were melanoma samples, one of which also carried a driver NRAS mutation, one was a colorectal cancer and the last was a lung cancer sample with a KRAS driver mutation. These tumours would be unlikely to harbour a MET exon 14 skipping driver mutation, suggesting that these calls could be false positives. Indeed, more recent developments with the OFA/OCAv3 assays suggests that calling this particular RNA-based biomarker requires careful calibration of the assay, ensuring that the skipping event is called with a minimum of 1000 reads [29]. The remaining fusion calls that were identified in three additional phase 2 samples are also likely to be false positives, given they were not detected by most laboratories, and never in the matched phase 1 samples. The variability seen in the RNA portion of the assays highlights again that this part of the assay is more vulnerable to laboratory handling and workflow. Of interest the quality metrics for the RNA results were not markedly different between phase 1 (where pre-extracted mRNA was shipped to laboratories) and phase II (where laboratories extracted RNA locally). This suggests that the quality issues are not related to degradation of samples during shipping. Clearly, the metrics that accompany the assays are relevant for all clinical laboratories to ensure high quality results, but the RNA aspects of the assays likely require independent assessment of lab-specific metrics to ensure consistency and to limit false calls.
Overall we demonstrate that the OFA is a highly accurate and reproducible platform, in both the clinical diagnostic and research setting, for the detection of SNVs and CNVs using low input FFPE derived DNA. We have limited data about OCAv3, given that only a single laboratory used this assay, and results from this laboratory were likely compromised by the limited amount of material that could be shared and the 2X higher input amounts needed for that assay. Since only one participating laboratory used the OCAv3 assay we cannot extend our broader findings relating to OFA to OCAv3, or indeed other assays. The major limitation of this study is the small number of samples used, which was due to the difficulty of identifying clinical specimens with sufficient material to be shared cross multiple laboratories. Despite the limited number of samples, however, the data do provide important guidance about quality metrics that should be monitored for use of both DNA and RNA as input materials for next generation sequencing assays. Overall, this study further strengthens the case for using panelbased testing for small samples with limited amounts of available diagnostic material and provides some insights into the use of quality metrics to flag compromised samples with a high risk of false negative results. It also provide important insights into the importance of standardized protocols, training and robust clinical validation prior to the use of these assays in the clinical setting. Specifically, important insights from this study suggest that a) the quality metrics tracked in the study for both DNA and RNA components are critical elements of the analytic process and should be part of any standardized approach to assessing the assays, and b) that the RNA component appears more variable, suggesting that clinical validation for each fusion requiring detection might be considered independently. As panel-based targeted sequencing becomes more widely available, studies like the one presented here may form an invaluable source of data to inform quality assurance and assessment approaches to diagnostic targeted sequencing assays.
The use of panels such as OFA or OCA in the clinical setting for patients means that even patients with very small amounts of tumour material may be able to access molecular testing to guide their clinical management. In Ontario, testing for predictive biomarkers including EGFR for lung adenocarcinoma, BRAF in malignant melanoma and KRAS and BRAF in metastatic colorectal cancer supports the use of targeted therapies for these patients. Assays such as those investigated in this study will continue to provide this critical information to patients as these indications continue to expand. The onus on the laboratories continues to be to ensure that the metrics guiding the use of these assays are well understood.