Rapid and highly-specific generation of targeted DNA sequencing libraries enabled by linking capture probes with universal primers

Targeted Next Generation Sequencing (NGS) is being adopted increasingly broadly in many research, commercial and clinical settings. Currently used target capture methods, however, typically require complex and lengthy (sometimes multi-day) workflows that complicates their use in certain applications. In addition, small panels for high sequencing depth applications such as liquid biopsy typically have low on-target rates, resulting in unnecessarily high sequencing cost. We have developed a novel targeted sequencing library preparation method, named Linked Target Capture (LTC), which replaces typical multi-day target capture workflows with a single-day, combined ‘target-capture-PCR’ workflow. This approach uses physically linked capture probes and PCR primers and is expected to work with panel sizes from 100 bp to >10 Mbp. It reduces the time and complexity of the capture workflow, eliminates long hybridization and wash steps and enables rapid library construction and target capture. High on-target read fractions are achievable due to repeated sequence selection in the target-capture-PCR step, thus lowering sequencing cost. We have demonstrated this technology on sample types including cell-free DNA (cfDNA) and formalin-fixed, paraffin-embedded (FFPE) derived DNA, capturing a 35-gene pan-cancer panel, and therein detecting single nucleotide variants, copy number variants, insertions, deletions and gene fusions. With the integration of unique molecular identifiers (UMIs), variants as low as 0.25% abundance were detected, limited by input mass and sequencing depth. Additionally, sequencing libraries were prepared in less than eight hours from extracted DNA to loaded sequencer, demonstrating that LTC holds promise as a broadly applicable tool for rapid, cost-effective and high performance targeted sequencing.


Introduction
Targeted Next Generation Sequencing (NGS), as a faster and cheaper alternative to equivalent depth whole-genome or whole-exome sequencing, is common practice in many research, commercial and clinical applications. As sequencing technologies continue to become more accessible, the adoption of targeted NGS into more labs and markets is likely to follow. PLOS  We have developed Linked Target Capture (LTC), a novel target capture method with broad application, designed to reduce hybridization workflows to less than eight hours while retaining high performance over a wide range of panel sizes. LTC replaces existing hybridization methods with a combined 'target-capture-PCR' workflow using linked capture probes and universal amplification primers. Here we describe the LTC method and demonstrate its ability to rapidly deliver enriched sequencing libraries from multiple sample types, including formalin-fixed, paraffin-embedded (FFPE) tissues, plasma-derived cell-free DNA (cfDNA) and cell line DNA. Additionally, with the integration of UMIs, we demonstrate LTC's ability to detect low-level single nucleotide variants, copy number variants, insertions/deletions and gene fusions.

Linked target capture concept
The LTC method is illustrated in Fig 1 for Illumina sequencers, though it is expected to be compatible with most sequencing platforms. The workflow begins with ligation of short Yadapters that contain truncated portions of the Illumina P5 and P7 flow cell binding sequences, such that ligated molecules will not bind to the flow cell and be sequenced without further processing. Following ligation and pre-amplification using universal primers, two sequential target-capture-PCR (tcPCR) steps are performed with Probe-Dependent Primers (PDPs). PDPs consist of non-extendable DNA capture probes linked 5' to 5' with a low melting-temperature universal primer complementary to a portion of the ligated adapter (Fig 1(ii) and 1(iii)). When bound to their targets, the probes bring the universal primer into close proximity with the universal priming site on the template, increasing the reaction rate of primer binding and initiating polymerase extension. The polymerase displaces or digests the probe portion of the PDP to make a copy of the entire target template, and the reaction proceeds to the next tcPCR cycle. To create sequencer-compatible libraries, the second tcPCR integrates the full Illumina P5 and P7 sequences into the universal primer portion of the PDPs. Both tcPCR reactions are performed above the melting temperature of the universal primers so that amplification is heavily biased towards target-bound PDPs.
As described in the Materials and Methods, PDPs are made by reacting separately synthesized probes and primers. PDP panels are made by linking probe sets to the universal primers, making panel generation, expansion, and combination straightforward.

Assay validation
To validate the LTC workflow, PDPs were designed to cover relevant portions of 35 cancerrelated genes, as described in Materials and Methods and listed in S1 Table. PDPs were chosen to capture four major variant types, including single nucleotide variants (SNVs), insertions/ deletions (Indels), copy number variants (CNVs) and structural rearrangements (ex: gene fusions). Libraries were created and sequenced in duplicate from four sample types, as outlined in Table 1: mechanically sheared cell line DNA, enzymatically sheared cell line DNA, cfDNA, and FFPE-derived DNA. Additionally, to test lower input mass, duplicate libraries were created and sequenced from 5 ng of mechanically sheared cell line DNA. The total time from extracted DNA to loaded sequencer was eight hours, with about three hours of hands-on time.
All libraries were analyzed through the same pipeline (see Materials and Methods) and down-sampled to a fixed number of sequencing clusters (or read pairs) for a given input mass (2 M read pairs for 40 or 50 ng, 0.2 M for 5 ng). Fixing the number of read pairs is important when comparing results, as the same sequencing data analyzed with different numbers of read pairs produces different results (especially in coverage). This is attributed to several factors, including insufficient reads for a given input mass (or a given number of input genomes), and Poisson variation. Fixed-read results are shown in Table 1. On-target fraction, mean target coverage and uniformity were calculated using Picard CollectHSMetrics (broadinstitute. github.io/picard/), as described in Materials and Methods.
These data demonstrate consistently high on-target fraction (86%-97%) and uniformity (90%-96%) across a range of sample types and input mass relevant to clinical applications of targeted sequencing. As a reference, commercially available Roche, Illumina and Agilent methods have been compared using a 110 gene panel, and ranged in performance from 75% tõ 87% on-target [17]. While not a direct comparison, this reference provides a good indicator of relative performance, as it is typically easier to achieve high on-target fraction on large panels [16] (a direct in-house comparison was not undertaken due to the significant cost of capture panels). To demonstrate the scalability of LTC, we measured enrichment on four of the 35 (a) Custom adapters (i) are ligated to template DNA and the resulting product is amplified with universal primers. (b) Target regions are selectively amplified using custom probe-dependent primers (PDPs) (ii) which contain a recognition sequence (dark grey) with a 3' blocker (black diamond) and are linked to an oligo containing a universal priming sequence for the first target capture PCR reaction (tcPCR1). (c) A second set of PDPs (iii), which contain Illumina adapters (red and black) between the probe and linked universal primer, are then added and a second target capture PCR reaction (tcPCR2) is completed prior to (d), clean up and QC and (e) loading on a sequencer. The inset shows detail of forward and reverse PDPs. https://doi.org/10.1371/journal.pone.0208283.g001 Rapid and highly-specific target capture enabled by linking capture probes with universal primers gene targets (BRAF, EGFR, ERBB2 and TP53), using 50 ng mechanically-sheared cell line DNA. The measured on-target fraction was >97% in both replicates, higher than the same measurement for our 35-gene panel. Similarly-sized small panels using conventional singleround target capture reported~5% on-target reads in [16] and [18].
A comparison in uniformity can be made against the use of a SureSelect XT panel (Agilent Technologies) covering 231 SNV targets in 26 genes [19]. For FFPE samples with similar coverage (>1000x) in [19], the authors report uniformity ranging from~50% to 93%, whereas both FFPE replicates using LTC had a uniformity of 96%.
Insert length distributions for each sample type were calculated using Picard CollectInsertSi-zeMetrics and are shown in Fig 2. Mechanically sheared cell lines were created by the manufacturer to produce a majority of inserts in the range of~100 bp to 250 bp (see Materials and Methods). The recovered insert lengths for these samples represent a good match to the expected size distribution with 89% of targets between 100 bp and 250 bp. Enzymaticallysheared DNA samples produced slightly longer inserts, likely a function of the shearing protocol. Additionally, the median insert size for the cfDNA samples was 175 bp in a reasonably narrow distribution, in good concordance with literature [20]. A peak was also visible around~325 bp, suggesting these long fragments may have been wrapped twice around the histone. Finally, FFPE-derived DNA samples also produced a short insert length distribution, as expected from the degradation associated with FFPE samples combined with enzymatic shearing and repair.

Variant detection
To enable the detection of low level variants with LTC, UMIs consisting of four random bases in series were integrated into Illumina's 'Index 1' read position of the ligation adapter. The UMIs were used in conjunction with the start and stop positions of the inserts to uniquely identify individual starting template molecules and to create consensus sequences (see Materials and Methods). A commercially available reference standard cell line (HD786, Horizon Discovery) was used to assess the ability of LTC to detect variants as it contains SNVs, CNVs, indels and fusions at levels characterized by the manufacturer. The variants covered by the 35-gene panel are listed in Table 2, along with the expected allele percentage as specified by the manufacturer for each of the different samples used in this study.
To test variant detection, 50 ng of DNA was used from each cell line. DNA from the reference standard (Sample 1) was analyzed in duplicate, along with duplicate analysis of cell line Table 1. LTC 35-gene sequencing performance data for two replicates of sample type and DNA mass. On-target fraction was defined as the fraction of total bases that aligned to the target regions. Mean target coverage was defined as the mean de-duplicated coverage over all target regions, and uniformity was defined as the fraction of on-target bases that were covered within two fold of the mean target coverage (i.e. between 0.5x and 2x of the mean). Median insert length was measured over all de-duplicated on-target inserts. from the same manufacturer known to be wild type for the variants of interest (Sample 3). A ten-fold titration (Sample 2) of the reference standard was made with the wild type cell line, and also tested in duplicate. Sequencing analysis and variant calling was performed as outlined in Materials and Methods. The measured variant fractions for detected SNVs, indels and fusions are plotted against the expected fractions in Fig 3. All the variants that were detected were measured at allele frequencies within~3x of expected values. Expected variants as low as 0.25% were detected (the lowest fraction tested in this study), which corresponds to~38 mutant fragments present in the initial 50 ng sample (assuming 3.3 pg of DNA per human haploid genome). Since ligation yield in general is much lower than 100% [21], the actual number of mutants entering capture could be considerably less than 38, and perhaps near sampling error for some loci. https://doi.org/10.1371/journal.pone.0208283.g002 Table 2. Reference standard variants. The expected allele percentage was measured and specified by the manufacturer using digital PCR or next generation sequencing. Expected allele percentages are given for stock samples (Sample 1), samples diluted to 1/10 of the stock concentration (Sample 2), and wild-type samples (Sample 3). Discrepancy between measured and expected values may be attributed to a number of factors including the differences in variant calling methods, titration of the reference standard, and the relative sequencing coverage of each variant, all of which could potentially lead to sampling error. All twelve (100%) of the SNV, indel and fusion variants were correctly identified at 0.00% variant fraction in the wild-type cell line (Sample 3 replicates). Eight of the twelve (67%) SNV, indel and fusion variants expected between 0.25% and 0.56% were detected in the diluted reference standard (Sample 2 replicates), while all twelve (100%) of the same variants expected between 2.5% and 5.6% were detected in the reference standard (Sample 1 replicates). With further improvements to LTC (see Discussion), we expect even higher sensitivity and lower detection limits to be possible.

Gene
Copy number variation was assessed for the MET gene in all six samples (replicates of Samples 1, 2 and 3) by our analysis pipeline, which was designed to identify samples as "amplified", "deleted" or "copy-number neutral" (see Materials and Methods). The MET gene was identified as "amplified" in both replicates of Sample 1, and "copy-number neutral" for both replicates of Samples 2 and 3. These results were consistent with expectations, as the 4.5 copies of the MET gene present in each Sample 1 replicate should be easily detectable above background, even when compared against only two wild-type samples (Sample 3 replicates). On the other hand, the 1.35 copies of the MET gene in each of the Sample 2 replicates would likely require many more measurements to confidently detect a copy number variation above the wild-type background.

Discussion
In general, target capture performance and workflow improvements have the potential to increase NGS and target capture usage in existing applications as well as enable new opportunities if workflow time, complexity or costs are reduced.
Arguably the most significant improvement of LTC over existing methods is a dramatic decrease in workflow time. To the best of our knowledge, the IDT xGen workflow is the fastest commercial hybridization capture method, completed in nine hours. This does not include library preparation, which usually adds several hours to the workflow, and requires the assay to be run over two work days. In contrast, the LTC workflow was completed in eight hours, including library preparation and loading of the sequencer.
Typical capture workflows are limited by the length and performance of the hybridization step, which on its own can extend to 72 hours. Shorter hybridization steps typically compromise performance resulting in either lower de-duplicated target coverage or higher off-target fraction. LTC avoids this tradeoff and shortens this rate limiting step by employing a combined target-capture-PCR (tcPCR) step. High de-duplicated target coverage is achieved by specifically capturing both senses (see Materials and Methods) and by operating at a relatively low temperature compared to the probe binding temperature, while the on-target fraction is increased through many effective capture cycles performed in each tcPCR reaction. An additional benefit of tcPCR is workflow simplicity. In conventional target capture workflows, biotinylated probes typically require binding to streptavidin coated beads to enrich for the target DNA. The subsequent bead capture and wash steps are generally complicated, labor intensive and can be difficult to automate [22], potentially limiting deployment of target capture workflows in some cases. On the other hand, the LTC tcPCR setup and operation are analogous to a standard PCR reaction, and thus are more familiar to a larger number of technicians, and also easier to automate. Additionally, it should be noted that LTC could be paired with any library preparation method that introduces the correct adapter sequences, such as single stranded library prep [23] or transposition [24].
A related advantage of the combined PCR-capture step is the ability to produce consistent sequencing performance from low input mass samples. Sequencing parameters, including coverage, scaled as expected from the 50 ng samples down to 5 ng, suggesting that LTC is able to recover molecules efficiently across a wide mass range. This is especially important in applications where sample is limiting, and could be tested to even lower limits in a future study.
It should be noted that the workflow time and complexity of LTC is comparable to multiplexed PCR (such as AmpliSeq by ThermoFisher) and hybridization extension methods (such as [11]). LTC holds a number of significant advantages over these methods, however. First, since the LTC primers are universal, it does not require sample splitting before amplification to prevent unwanted amplicon formation. This avoids loss of sensitivity and the requirement for large DNA input mass. Second, since LTC probes are displaced by the extended universal primer, sequence information at probe binding sites is retained on the amplified molecules to be sequenced, thus capturing all of the sequence information available from a single fragment. This is in contrast to PCR and hybridization extension methods where any variants contained within a PCR primer binding site are lost after the primer has bound and extended. LTC also retains fragment start and stop positions, which are lost in PCR and hybridization extension methods, and have been shown to provide useful biological information [25]. Additionally, it is generally much easier to integrate low variant detection in hybridization capture methods like LTC compared to PCR methods. When UMIs are integrated in ligation as they are in LTC, it is easier to avoid labelling a single molecule with multiple UMIs, which can occur in PCR methods. Also, to our knowledge, it is not possible to integrate duplex sequencing in a PCR-based UMI method, but this has been demonstrated with LTC. Finally, since the challenges associated with multiplex PCR are reduced through the use of universal primers, the LTC workflow can be applied to a wide range of panel sizes, including large panels for which multiplex PCR methods would not work. Small panels have been demonstrated in this study, and initial work towards larger panels indicates that exome-scale LTC panels may be possible. This is advantageous, as a single workflow could be implemented for multiple assays or applications.
LTC has several other unique properties. Primers and probes can be oriented to capture a specific strand of the target duplex DNA (ex: the sense strand, see Materials and Methods), providing an advantage in rare variant detection, or in applications where it is desirable to sequence only one strand of the starting template such as transcriptome sequencing [26]. In addition, LTC has been demonstrated in droplets, providing multiplexing capabilities to droplet-based assays not achievable with standard capture methods.
The sequencing statistics achieved using Linked Target Capture were greater than 91% average on-target and 94% average uniformity, providing cost-effective sequencer usage and leaving little room to improve these metrics. Measuring how these factors scale to much larger panels would be an important part of a future study. Mean target coverage was lower than initially expected, by about two to three fold compared to hybridization capture with similar analysis [27]; we suspect this to be due to the lack of LTC probe tiling. The 35-gene panel used in this work consisted of fairly sparse probe placement to reduce panel cost, such that the probes covered less than 100% of bases in the targets. Initial data from tiling two targets in the 35-gene panel to nearly 200% demonstrated a more than 3-fold increase in mean target coverage, which agrees with previously reported tiling improvements of at least two-fold [28]. It is expected that tiling will significantly improve mean target coverage as well as variant detection when applied across the whole 35-gene panel.
Variant detection may be further improved through the use of lower error UMI designs. Like hybridization and capture methods, the error rate of LTC is expected to be linked to the UMI design used in a given assay. For example, integrating duplex UMIs into the LTC ligation adapters is expected to further reduce the detection limit, similarly to the reduction observed for duplex UMIs applied to hybridization and capture methods [27]. Increasing the input mass and sequencing depth are also expected to lower the detection limit of LTC.
Finally, it should be noted that for certain applications, capturing targets larger than demonstrated in this study (Fig 2) would be advantageous [29,30]. While capture of targets several hundred bases in length has been demonstrated with LTC, understanding the upper limit of capture size would be another important part of a future study.
In summary, we have developed a novel target capture method with a rapid workflow and efficient sequencer usage. With continued improvements in tiling and panel expansion, we expect LTC to be a high performance target capture method applicable in many settings.

PDP design and conjugation
In order to enable panel design flexibility, PDPs were made by conjugating target-specific probes and universal primers. The universal primers (forward and reverse) were manufactured by Integrated DNA Technologies (IDT) and contained a 5' Dibenzocyclooctyl (DBCO) modification. The forward and reverse untailed primers for the first target capture step were CACCGAGATCT and TACGAGATCGG respectively. The forward and reverse tailed primers for the second target capture step were AATGATACGGCGACCACCGAGATCT and CAAGCAGAAG ACGGCATACGAGATCGG respectively.
Capture probes were designed to cover portions of 35 cancer-related genes, shown in S1 Table. Total sequence coverage was 11,473 bp. Probes were designed with adjacent forward and reverse probes covering the desired regions, with zero gap between forward and reverse probes, a minimum length of 30 bp, maximum length of 70 bp, and a melting temperature of 85˚C calculated using uMELT [31] with default conditions. Probes were synthesized by IDT with a 5' azide modification to conjugate with the DBCO on the primer and a 3' inverted dT base, to inhibit polymerase extension.
Pools of forward and reverse probes were conjugated with both forward and reverse primers separately by mixing 22.5 μM primer with 10 μM total probe concentration, in 0.6x PBS. Each mixture was incubated at 60˚C for 16 hours. After incubation, the conjugates were purified using a modified Agencourt AMPure XP Kit (Beckman Coulter) and eluted in 20 μL 0.1x IDTE (IDT). A 2:1 bead to sample ratio was used according to the manufacturer's instructions, except that prior to use, the bead buffer was extracted and replaced with an equal volume of a custom formulated buffer. The custom buffer consisted of 30% w/v PEG-8000, 1 M NaCl, 0.05% v/v Tween 20, 10 mM Tris-HCl, and 1 mM EDTA (all reagents from Sigma-Aldrich). Following conjugation, PDPs were quantified using the Qubit ssDNA Assay (ThermoFisher Scientific). Conjugates were made and then stored at -20˚C. PDPs consisting of forward probes with forward primers were labelled as FF, reverse probes with forward primers RF, and so on for all four combinations.

Sample sources
Four sample types were used in this study: mechanically sheared cell line DNA, enzymatically sheared cell line DNA, plasma-derived cell-free DNA (cfDNA), and FFPE -derived DNA. Mechanically sheared DNA was obtained from Horizon Discovery in mutant (HD786) and wild-type (HD776) standards (Samples 1 and 3, respectively, from Table 2). Mechanical shearing was performed by the manufacturer such that around 60% of the templates were within 100 bp to 250 bp, with fragments up to 400 bp. Mutation levels were measured by the manufacturer using droplet digital PCR. Enzymatically sheared cell line was generated from genomic DNA (HD753, Horizon Discovery), following the protocol described below. cfDNA was isolated from single donor human plasma samples (IPLAS-K2 EDTA, Innovative Research), as described below. FFPE-derived-DNA was obtained from Horizon Discovery, part number HD799.

Cell-free DNA extraction
First, 5 mL of plasma was centrifuged for 10 min at 2,000g. cfDNA was isolated from each sample using the QIAamp Circulating Nucleic Acid Kit (Qiagen) according to the manufacturer's instructions. DNA was eluted from the column in 0.1x IDTE in a two-step process to maximize elution yield: 50 μL of 0.1x IDTE was incubated in the column for 10 min, followed by a 20,000g spin for 3 min; the column was then re-eluted after a 3 min incubation with another 50 μL 0.1x IDTE for a total elution volume of 100 μL. The DNA sample was further purified to remove any potential inhibitors using the Agencourt AmPure XP Kit (Beckman Coulter). A 1.4:1 bead to sample volumetric ratio was used as per manufacturer's instructions, with the sample eluted in 0.1x IDTE. Extracted and purified DNA was then used directly for library preparation, or in cases where library preparation did not proceed within 24 hours, was frozen at -20˚C.
Following DNA extraction, sample concentration was measured using the Qubit dsDNA HS kit (ThermoFisher Scientific) as per the manufacturer's instructions and used to calculate the number of human genome equivalent copies in each sample.

FFPE DNA pre-treatment
FFPE-derived DNA was pre-treated to reduce the impact of potential DNA damage, before target capture. 100 ng of DNA, as quantified by the Qubit dsDNA HS kit (ThermoFisher Scientific), was digested with 1 unit of UDG enzyme (New England Biolabs (NEB)) in a 50 μL reaction in 1X of the supplied reaction buffer (NEB). The mixture was incubated at 37˚C for 10 minutes, cooled to 4˚C, and immediately purified with the Agencourt AmPure XP Kit at a 3:1 bead to sample volumetric ratio as per manufacturer's instructions. Samples were eluted in 20 μL of 10 mM Tris-HCl, pH 8. Total amplifiable DNA was quantified using KAPA hgDNA Quantification and QC Kit (KAPA Biosystems) as per the manufacturer's instructions.

Enzymatic DNA shearing
Prior to enzymatic shearing, a buffer exchange was performed with cell line and FFPE-derived DNA samples with the Agencourt AmPure XP Kit at a 3:1 bead to sample volumetric ratio according to the manufacturer's instructions. Samples were eluted in 40 μL of 10 mM Tris-HCl, pH 8. Cell line and FFPE-derived DNA samples were then enzymatically sheared immediately before ligation using the KAPA HyperPlus kit (KAPA Biosystems) according to the manufacturer's instructions. 50 ng of cell line DNA or 40 ng of FFPE-derived DNA in 35 μL volume was added to 10 μL KAPA fragmentase (KAPA Biosystems) and topped up to a final volume of 50 μL in 1x supplied reaction buffer. Samples were incubated at 37˚C for 30 minutes, afterwards proceeding immediately to the A-tailing step of adapter ligation (described below). Shearing conditions were chosen as per manufacturer's instructions to achieve a mode fragment length of 150 bp.

Adapter ligation
The KAPA Hyper Prep kit (KAPA Biosystems) was used as per the manufacturer's instructions, with a 15 minute ligation incubation and a 200:1 adapter to insert ratio. Custom ligation adapter sequences were ordered for the LTC workflow (IDT), consisting of AGCACGCACCGA GATCTACACBBBBACACTCTTTCCCTACACGACGCTCTTCCGATCTT annealed to AGATC GGAAGAGCACACGTCTGAACTCCAGTCAC BBBBNNNN ACCGATCTCGTAACTCAGCGG, where BBBB indicates a four base sample-specific barcode for multiplexing samples on the sequencer, and NNNN indicates a four base UMI. The UMI-containing adapter was phosphorylated on its 5' and 3' ends. The non-UMI adapter contained a phosphorothioate bond between the last two bases on the 3' end of the adapter. After ligation, the ligation mixture was purified using the Agencourt AMPure XP Kit (Beckman Coulter) as per manufacturer's specification, with a 0.4:1 bead to sample volumetric ratio, and eluted in 40 μL of 0.1x IDTE. After elution, the sample was topped up to 100 μL with 0.1x IDTE. An additional cleanup with the Zymo Select-a-Size DNA Clean & Concentrator column (Zymo Research) was performed, as per the manufacturer's instructions. A 5:1 binding buffer to ethanol ratio was used to select the desired product size. The final product was eluted in 25 μL of 0.1x IDTE. After cleanup, the entire volume of ligated DNA was amplified with custom primers TTTTTAGCACGCACCGAG ATCTACAC and TTTTTCCGCTGAGTTACGAGATCGGT. Amplification proceeded for eight cycles with 0.3 μM of each primer, in 1x KAPA HiFi HotStart ReadyMix (KAPA Biosystems). Annealing was performed at 60˚C for 30 s, extension at 72˚C for 20 s, and denaturing at 98˚C for 20 s. The amplified products were cleaned up using the Agencourt AMPure XP Kit as per manufacturer's specification, with a 1.2:1 bead to sample volumetric ratio, and eluted in 20 μL of 0.1x IDTE. The cleaned up template DNA was then quantified using the Qubit dsDNA HS kit (ThermoFisher Scientific) as per the manufacturer's instructions.

Target capture
Target-capture-PCR (tcPCR) for the 35-gene panel was performed in two subsequent steps, each consisting of two reactions per sample. In the first step, the PDPs with untailed primers were used, split into two 50 μL reactions such that in the first reaction FF and RR PDPs were used to capture one sense of the target, and in the second reaction FR and RF PDPs were used for the other sense. Each reaction consisted of 5 nM of each individual PDP, 15 ng template DNA, 5 units of Platinum Taq polymerase (ThermoFisher Scientific), 4 mM MgCl 2 , 0.2 mM dNTP (Invitrogen) in 1x Platinum Taq Buffer (ThermoFisher Scientific). 15 tcPCR cycles were performed with a 30 s denaturing step at 95˚C followed by a combined annealing and extension step at 66˚C for 105 s. The ramp rate was 4˚C/s between 95˚C and 85˚C, and then 0.2˚C/s from 85˚C to 66˚C. The second tcPCR was performed using 12.5 uL of the amplified material from the first tcPCR, and was otherwise identical to the first step, with the following exceptions: PDPs with the tailed primers were used, ramp rate was 4˚C/s throughout cycling, 12 cycles were performed, and the combined annealing and extension step was done at 68˚C. Following amplification, libraries were purified using two back-to-back bead cleanups, using the Agencourt AMPure XP Kit as per manufacturer's specification, with a 0.8:1 bead to sample volumetric ratio. Final libraries were eluted in 20 μL of 0.1x IDTE. tcPCR for the 4-gene panel was performed using a similar but earlier version of the protocol, that was the same with the exception of the following differences: reaction volume was 25 μL for both tcPCRs, 20 cycles were used in the first tcPCR, and 18 in the second, 6.25 μL of the first tcPCR was carried over into the second, the second tcPCR was eluted in 15 μL after cleanup.

Sequencing and data analysis
Targeted libraries were sequenced on an Illumina MiSeq or MiniSeq with paired-end 2 x 150 bp reads as per manufacturer's instructions. Prior to sequencing, samples were quantified using the KAPA Library Quant Kit (KAPA Biosystems) as per manufacturer's instructions. Resulting FASTQ files were demultiplexed by sample barcode using Fulcrum Genomic's FGBIO open source bioinformatics tool suite (https://github.com/fulcrumgenomics/fgbio) and then adapter-trimmed using Trimmomatic V0.36 [32]. Trimmed read pairs were combined and aligned to the GRCh38/hg38 reference sequence using BWA-MEM (https://github. com/lh3/bwa) and output in BAM format. SAMtools [33] was then used for sorting and indexing BAM files. The resulting BAM files were grouped into UMI consensus reads by FGBIO for low level SNV detection. Picard Tools 2.9 (https://github.com/broadinstitute/picard) was then used to collect hybrid selection metrics, including on-target fraction, mean coverage and insert length distributions. SNV, CNV and indel mutation calling was achieved using GATK4 (https://software.broadinstitute.org/gatk/gatk4). CNV detection was not quantified, but CNVs were identified as "amplified", "deleted" or "copy-number neutral" by the GATK4 CallCopyR-atioSegments caller. Fusion detection was measured by comparing Picard de-duplicated reads containing alignments to both the CCDC6 and RET genes. Analysis outputs for assay validation and variant detection can be found in S2 and S3 Tables, respectively.
Supporting information S1