A research-based gene panel to investigate breast, ovarian and prostate cancer genetic risk

There is a need to investigate and better understand the inherited risk of cancer to ensure that clinical applications provide more accurate assessments and management strategies. Developing research-based next-generation sequencing gene panels that not only target (present-day) clinically actionable susceptibility genes but also genes that currently lack sufficient evidence for risk as well as candidate genes, such as those in DNA repair pathways, can help aid this effort. Therefore, gene panel B.O.P. (Breast, Ovarian, and Prostate) was developed to evaluate the genetic risk of breast, ovarian and/or prostate cancer, and this manuscript serves as an introduction to B.O.P. and highlights its initial analytical validity assessment. B.O.P targets 87 genes that have been suggested, predicted, or clinically proven to be associated with breast, ovarian, and/or prostate cancer risk using Agilent Technologies Haloplex probes. The probes were designed for 100 base pair reads on an Illumina platform and target both coding and non-coding exons as well as 10 intronic base pairs flanking the intron-exon boundaries. The initial B.O.P screening involved 43 individuals from the Alabama Hereditary Cancer Cohort, and an average sequencing depth of 809X was obtained. Upon variant filtering and validation, true positives had an average sequencing depth of 659X and allele balance of 0.51. The average false positive sequencing depth was 34X and allele balance was 0.33. Although low sequencing depth was not always indicative of a false positive, high sequencing depths (>100X) signified a true positive. Furthermore, sensitivity and specificity of BRCA1/2 were calculated to be 100% and 92.3%, respectively. Overall, this screening enabled the establishment of criteria to alleviate future validation efforts and strongly supports the use of B.O.P. to further elucidate hereditary cancer susceptibility. Ultimately, continued B.O.P. screening will provide insights toward the genetic risk of and overlap between breast, ovarian, and/or prostate cancer.


Introduction
Gene panels enable the simultaneous screening of a number of genes. Panels are typically customized for specific screening purposes; thus, the genes (and/or specific gene regions) on such PLOS  panels are unique to the screening goals. In recent years, with technological sequencing advances, panel-based screening has become extremely efficient and cost-effective. These advancements involve the targeted enrichment of selected genes followed by massively parallel sequencing, which is also known as next-generation sequencing (NGS) [1,2]. NGS gene panels have been implemented into clinical practice to assess inherited risk of cancer [1][2][3]. These panels include clinically valid genes for which clinical management guidelines have been established, such as genetic risk assessment criteria and mutation-positive management strategies. In the U.S., the National Comprehensive Cancer Network (NCCN) provides such guidelines to maximize clinical utility [4]. The American College of Medical Genetics and Genomics (ACMG) has established clinical laboratory standards for NGS gene panels but, ultimately, these panels are regulated by the Clinical Laboratory Improvement Amendments (CLIA), a federal program that certifies and oversees clinical laboratory testing [5,6]. CLIA primarily assesses analytical validity-the accuracy of mutation detection-in order to maintain quality standards and ensure the effectiveness of each laboratory test [6].
CLIA does not regulate research-based genetic testing, but similar analytical assessments can be carried out to ensure accurate mutation detection in a research setting. We have developed a research-based gene panel, B.O.P. (Breast, Ovarian, and Prostate), to assess inherited risk of hereditary breast cancer (BC) and associated cancers. B.O.P. is an exploratory gene panel. In addition to targeting clinically valid genes that have NCCN management strategies, it also targets genes that have been suggested to be associated with an increased risk but currently lack sufficient evidence, as well as candidate genes, such as those in DNA repair pathways. Therefore, the ultimate goal in utilizing this panel is to better elucidate risk. Regarding hereditary BC, NCCN clinically valid genes only account for a minority of the associated genes reported in the literature [1,7,8]. Furthermore, NCCN risk management strategies have primarily been developed for overtly pathogenic, truncation mutations in clinically valid genesresulting in the detection of many variants of unknown significance (VUS), and clinically valid mutations explain less than 30% of hereditary BC cases. Additional exploration is critical to fill these knowledge gaps, and B.O.P. can aid in this investigation. However, prior to using B.O.P. as a way to increase knowledge in these areas, it must be evaluated for its ability to accurately detect variants. The purpose of this manuscript is to introduce B.O.P., present the analytical assessment of 10 NCCN regulated genes (in order to ensure the accurate detection of clinically relevant variants), and discuss the future potential of the panel.

Ethical compliance and informed consent
All procedures performed in studies involving human participants were in accordance with the ethical standards of Auburn University and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Specifically, this research was reviewed and approved by the Auburn University Institutional Review Board for the recruitment, enrollment and biobanking of the Alabama Hereditary Cancer Cohort (AHCC; IRB protocols 14-232, 14-335, and 15-111) [9]. Informed consent was obtained in writing from all individual participants included in the study.

Panel design
B.O.P. targets~500 kilobases (Kb) of DNA including 87 genes that are suggested, predicted, or clinically proven to be associated with BC, OvC, and/or prostate cancer (PC) risk (S1 Table). Agilent Technologies Haloplex probes were designed using Agilent Technologies SureDesign software (https://earray.chem.agilent.com/suredesign/). The "Advanced HaloPlex" design allowed for the selection of the desired genes of which the targeted regions included both coding and non-coding exons as well as 10 intronic base pairs flanking the intron-exon boundaries. The probe set was designed for 100 base pair reads on an Illumina platform. Overall, probes were predicted to cover 98.93% of the targeted genes/regions (Table 1).

Capture and sequencing
The genomic DNA of 43 cancer-affected individuals (23 African American [AA] and 20 European American [EA]) from the AHCC [9] was selected to undergo the first B.O.P. screening (Figs 1 and 2). Two study participants (1CAD-a and 1CAD-f) were knowingly related (first cousins). The HaloPlex HS Target Enrichment System For Illumina Sequencing Protocol (Version C0, December 2015) was followed for the targeted-capture, allowing each of the 43 samples to be uniquely barcoded/indexed, individually captured, and pooled in equimolar amounts for Illumina paired-end sequencing. One pooled sample with a final concentration of 24.13 nanomoles/liter (and DNA fragments ranging from 175 to 625 base pairs) was sent for sequencing on one lane of a flow cell on an Illumina HiSeq 2500 at the Genomic Services Laboratory (GSL) at HudsonAlpha Institute for Biotechnology. The final DNA quality/quantity of the pooled sample was assessed using the High Sensitivity DNA kit using the ABI 2100 Bioanalyzer.

Bioinformatics analyses
The sequencing data generated for each indexed sample (43 forward and 43 reverse FASTQ files) were downloaded using the GSL's wget downloader (Fig 2). Trimmomatic (v.0.35) was used to trim the unique barcodes and Illumina adaptors. After generating trimmed FASTQ files, FastQC (v.0.10.1) was used to ensure that the repeated sequences had been trimmed from the sequences. The trimmed, paired forward and reverse FASTQ files were then aligned to the soft-masked human reference genome (GRCh38/hg38) using Burrows-wheeler Aligner (BWA v.0.7.12), generating SAM (Sequence Alignment/Mapping) files, which were then compressed into BAM (binary SAM) files and sorted using SAMtools (v.1.2). PicardTools (v.1.79) was used to add read groups and then index the sorted, compressed BAM files along with realigning insertions and deletions (indels). As previously suggested for Haloplex, duplicates were not marked or removed [10]. Variants in B.O.P targeted-regions were called from the sorted BAM

Analytical assessment
Ten of the 87 genes were selected for our initial analytical validation efforts ( Table 1). The selected genes represented clinically actionable BC susceptibility genes in order to facilitate analytical validity calculations, as well as to begin to determine the landscape of potential clinically significant variants in the AHCC. Using Exome Variant Server (EVS) as a control repository, B.O.P. variants in those 10 genes were filtered for minor allele frequencies (MAFs) of less than or equal to 2% in both EAs and AAs [12]. Variants were further filtered; all coding variants as well as intronic variants that were located within 10 base pairs of an intron-exon boundary were carried through for validation using polymerase chain reactions (PCR) and Sanger sequencing (Fig 1). Primer sequences and amplification conditions are available upon request. P values and Odds ratios (ORs) were calculated using Fisher exact test in R (v 3.5.1), which were not adjusted for multiple testing. Upon consent and enrollment into the AHCC, study participants provided information about previous clinical genetic screening [9]. Thus, for genes with clinical screening results provided by the 43 participants involved in this initial B.O.P. screening, sensitivity and specificity were calculated. Sensitivity was defined as the total number of true positives (TPs) divided by the sum of the total number of TPs and false negatives (FNs; TPs/ [TPs + FNs]). TPs were defined as (i) variants that had been previously identified through clinical gene screening, initially confirmed in the research laboratory by PCR and Sanger sequencing, and, subsequently, detected upon B.O.P. screening, or, (ii) in the case of no clinical screening

Results
The number of reads that passed QC assessment per individual averaged 11.3 million (M), with 98.6% of those reads mapping to the human genome. On average, 50.9% of the reads mapped to B.O.P. targeted regions (S2 Table). The average sequencing depth for all targeted base pairs was 809X; however, sequencing depth was not uniform with a large interquartile range (Table 1). A probe design report provided by Agilent Technologies predicted that 98.9% of the targeted base pairs would be covered at least 1X, which was similar to the actual coverage of 98.2%; thus, 1.8% of the targeted base pairs were not covered at all ( Table 1). The 10 assessed genes had average sequencing depths that ranged from 505X-1017X (Table 1). Furthermore, assessment of the 225 different regions that targeted those 10 genes revealed that the majority had average sequencing depths between 800-899X but ranged from 68X-2053X (  Table). Although rare, regions with average sequencing depths less than 100X missed on average 24.3% of the targeted base pairs and only covered 52.2%, 34.6%, and 21.3% of targeted base pairs at or greater than 20X, 50X, and 100X, respectively (Table 2; S3 Table). Regions with the highest average sequencing depth, 1500X or greater, had over 99% and 96.5% of the targeted base pairs covered at least 50X and 100X, respectively. However, 28.0% of the 225 regions-offocus did not, on average, obtain 100% coverage at 1X, which included regions with average sequencing depths ranging from 68X-1354X (Table 2; S3 Table). Upon variant annotation (Fig 2), a total of 24,915 variants (2,858 unique) were called. After filtering for variants in the 10 genes (Table 1), a total of 1960 (287 unique) remained, 74 (56 unique) of which had MAFs less than 2% in both ethnicities (Table 3; Fig 2). A total of 61 of the 74 variants were validated and classified as TPs, averaging a sequencing depth of 659X and an allele balance of 0.51; this included 100% of the variants in seven out of the 10 genes (ATM, BRCA2, CHEK2, NBN, PALB2, STK11, and TP53), resulting in FDRs of 0 (Table 3). BRCA1 and CDH1 each had one FP, and PTEN had the highest FDR with 11 FPs, all in intron7/exon8 ( Table 3). Despite that the average FP sequencing depth was 34X (ranging from 12X-63X), sequencing depths of the three regions harboring FPs revealed that all achieved an average greater than 427X (Table 3 and S3 Table). The average FP allele balance was 0.33, ranging from 0.13-0.68.
Though not optimal, low sequencing depth was not always indicative of a FP. Of the 20 variants covered less than 100X, 13 were FPs, and 7 were TPs with average sequencing depth of 60X and allele balance of 0.48, ranging from 0.33-0.58 (Table 3). In contrast, higher coverage, such as sequencing depth greater than 100X, was an indicator of a TP; all 54 variants covered over 100X were determined to be TPs. This included two homozygous TPs, which were each covered over 1000X with the alternate allele being the only one detected (Table 3), and 52 heterozygous TPs that had an average sequencing depth of 724X and allele balance of 0.50. To note the importance of allele balance, 95% of the TPs had an allele balance above 0.40 compared to only 23% of the FPs. Eighty five percent of the TPs had over 100X coverage and an allele balance of 0.40 (Table 3). No variants with an allele balance less than 0.20 were TPs, and additional filtering to exclude such variants improved FDRs (Table 3).
Prior to B.O.P. screening, positive and negative BRCA1 and BRCA2 mutation status was known for eight of the 43 study participants; thus, sensitivity and specificity could be calculated for those genes. Seven study participants had previously undergone clinical BRCA1/2 screening; six reported negative results with no pathogenic variants identified. One individual, 1CB-a, received a positive report indicating a pathogenic BRCA2 frame-shifting mutation (c.5611_5615 delAGTAA [p.S1871fs] also known as c.5616_5620delAGTAA [p.K1872Nfs), which was confirmed using PCR and Sanger sequencing prior to B.O.P. screening (Fig 3 and Table 3). The eighth individual, 1CAD-a, had not personally obtained clinical gene screening; however, a deceased family member had undergone clinical BRCA1/2 screening and received a positive report indicating a pathogenic missense mutation (BRCA1 c.5387T>G [p.M1796R]). Thus, this individual was screened for the familial mutation using PCR and Sanger sequencing prior to B. O.P. screening and tested positive (Fig 3 and Table 3). Noteworthy, another BC-affected family member, 1CAD-f, tested negative for BRCA1 p.M1796R in the research laboratory prior to B.O. P. screening but could not be considered a TN for the specificity calculation since full gene screening had not been carried out (Table 3). B.O.P. variant calling reported 12 and 14 variants in BRCA1 and BRCA2, respectively (Table 3). Upon Sanger sequencing confirmation, this included 11 TPs, zero FNs, six TNs and one FP in BRCA1, and 14 TPs, six TNs, and zero FNs and FPs in BRCA2 (Table 3), which corroborated the previously reported BRCA1 and BRCA2 mutation statuses. Therefore, B.O.P. screening of BRCA1/2 resulted in 100% sensitivity and    92.3% specificity. However, specificity became 100% with the elimination of variants with an allele balance of 0.20 or less. Of the 61 TPs, 45 were detected in AAs; this included 34 unique variants, eight of which were detected in multiple individuals (Table 3). According to ClinVar [13,14], the 34 variants were categorized as pathogenic/risk factor (n = 4), VUSs (n = 11), or benign/likely benign (n = 19). A total of five variants were predicted to be deleterious in Polyphen, two of which have been defined as pathogenic non-synonymous variants in ClinVar; the other three are currently classified as VUSs (Table 3). Of the eight variants detected in more than one individual, BRCA2 c.5020A>G; p.S1674G, was identified in two first cousins. The remaining seven were in seemingly unrelated individuals. This includes STK11 c.369G>A;p.Q123Q, a seemingly benign variant, which is reported to have a MAF of 1.5% in the general AA population but was detected in five of the 23 AAs in this study, indicating a MAF of 10.8% (P value 8.50 X 10 −4 ; Odds ratio 7.79 CI 95 [2.32-20.70]). Furthermore, the 45 AA TPs were detected in 96% of the AAs screened, and multiple variants were detected in 70% of the cases. In contrast, 16 TPs were validated in 55% of the EAs, and only 20% had multiple variants. The difference in the number of individuals from each ethnicity with at least one TP was significant (P value 2.71 X . No overtly pathogenic variants were validated in EAs, but 50% of the EA TPs (8/16) were listed as a VUS, three of which were predicted to be deleterious in Polyphen [15].

Discussion
Our group has developed B.O.P., a research-based NGS gene panel, which targets 87 genes that have been suggested, predicted, or clinically proven to be associated with risk of BC, OvC, and/or PC. The overall purpose of this new panel is to gain additional insights toward the genetic risk of and overlap between those three cancers. This manuscript served to introduce B.O.P. by reporting its initial screening, which involved 43 cancer-affected individuals from the AHCC [16]. Targeting~500 Kb of DNA, 98.9% of the base pairs were covered at least 1X, and an average sequencing depth of 809X was obtained. We took a closer look at 10 NCCN regulated genes in order to begin the analytical assessment of the panel and ensure the accurate detection of clinically relevant variants; upon variant filtering and validation, 100% of the variants in seven of the 10 genes were TPs. TPs had an average sequencing depth of 659X and allele balance of 0.51, whereas the average FP sequencing depth and allele balance was 34X and 0.33, respectively. Although FPs had a much lower average sequencing depth compared to TPs, a low sequencing depth was not always indicative of a FP. Contrarily, all variants called with high sequencing depths (>100X) signified a TPs. Furthermore, sensitivity and specificity of BRCA1/2 were calculated to be 100% and 92.3%, respectively.
There are a number of different targeted enrichment options to choose from when designing gene panels, all of which can have different affects on sequencing outcomes [10]. When comparing methods, Samorodnitsky et al. concluded that Haloplex had the highest on-target read alignment and normalized sequencing depth but the least uniformity [10]. Noteworthy, despite reports of Haloplex resulting in a high percentage (>90%) of on-target read alignments [10,17], 50.9% of our QC passed reads mapped to the B.O.P. targeted regions. With other Haloplex gene panel studies not reporting such data [18][19][20], it is difficult to make general conclusions about Haloplex on-target read alignment specificity. However, similar on-target read alignment percentages have been reported; Castera et al. used SureSelect baits in order to target hereditary BC and OvC susceptibility genes, and reported an average of 42% of reads ontarget [21]. Ultimately, the percentage of off-target reads is likely dependent on a number of factors, including the specific genes/regions being targeted [22]. Of the reads that mapped ontarget, B.O.P.'s overall sequencing depth averaged 809X, and each individually assessed gene obtained average sequencing depths from 505X-1017X. Nevertheless, large interquartile ranges indicated that depth was not uniform. This was expected since no current enrichment and sequencing approach provides complete uniformity primarily because of complex genomic regions that are very difficult to capture/sequence and result in low sequencing depths or even no coverage at all [10,22].
By focusing on a select set of genes/regions, NGS gene panels target a smaller number of base pairs compared to more broad applications such as exome and whole genome sequencing. The smaller target-capture provides the option to achieve a high average sequencing depth, which aids in variant identification [10,23]. Therefore, the overall goal is to obtain 100% coverage as well as the appropriate/desired sequencing depths at all targeted base pairs. Since this goal is not generally achieved, complementary assays can be used to fill in gaps, which is commonly implemented for clinical applications. In such cases, regions of low/no coverage are normally Sanger sequenced [22,23]. B.O.P. was able to cover, on average, 98.2% of its targeted base pairs at 1X. Being a research panel, no gap-filling assays were carried out; however, region-specific coverage analyses provided insight towards the feasibility of gap-filling. Gap-filling criteria has been described in a number of BC NGS gene panel publications, specifically, those that highlighted panel performance and analytical validity [21,[24][25][26]. Being clinical panels, regions covered less than 20X [21] or 50X [24][25][26] were checked by conventional methods. Interestingly, only two B.O.P. regions had average sequencing depths less than 100X (68X and 82X), which would not have required complementary assays to fill in gaps according to the criteria set in the referenced studies [21,[24][25][26]. This is despite that, on average, those two B.O.P. regions missed 24.3% of the targeted base pairs and only covered 52.2%, 34.6%, and 21.3% of targeted base pairs at or greater than 20X, 50X, and 100X, respectively. Furthermore, 63 of the 225 B.O.P. regions-of-focus did not, on average, obtain 100% coverage at 1X; these regions had average sequencing depths ranging from 68X-1354X. Thus, regions with high sequencing depths still had base pairs with no/low coverage, which happened to be where FPs were detected in this study. Therefore, only gap-filling regions with 'low' (20X or 50X) sequencing depths, will not guarantee 100% coverage. Establishing mapping criteria to ensure all base pairs are covered at a desired depth is ideal but would likely reveal gaps in too many regions, making gap-filling infeasible. Overall, gaps in B.O.P. as well as other panels, even with gap-filling criteria, can provide less than definitive negative results [23]; however, in noting that, zero FNs were identified in BRCA1 and BRCA2, resulting in 100% sensitivity.
In addition to gap-filling, conventional approaches are also used to validate called variants. A total of 1960 variants were detected in the 10 B.O.P. assessed genes and, to reduce the number of variants to validate for this analytical assessment, only variants with MAFs less than 2% in both ethnicities were Sanger sequenced. This included 74 variants, 61 of which were confirmed and defined as TPs revealing 13 FPs. The validation process ultimately provided insight regarding the likelihood of confirmation based on variant quality, such as sequencing depth since all 54 variants covered over 100X were TPs. These results corroborated the criteria established by Mu et al., which set high confidence calls as having a minimum sequencing depth of 100X and allele balance of 40%. Additionally, Mu et al. indicated that such calls did not require confirmation. Although, all B.O.P. variants covered at or above 100X were TPs despite allele balance, the criteria from Mu et al. will be implemented in the future in order to be thorough. This will limit validation efforts to low confidence calls, reducing the cost and time of validation.
In this study, 22 variants had low confidence calls. This included nine TPs, seven of which were covered less than 100X and two that failed to meet the required allele balance. The remaining 13 were FPs with an average sequencing depth of 34X and allele balance of 0.33, reiterating that low sequencing depths are susceptible to sequencing artifacts [10,23]. Interestingly, as mentioned in the previous paragraph, the regions harboring the FPs did not have low sequencing depths, stressing the potential lack of uniformity within a targeted region. On another note, 11 of the 13 FPs were in PTEN. Considering PTEN has a processed pseudogene, PTENP1 on chromosome 9, their homology could have contributed to probe mis-priming as well as read mis-alignments. Overall, encountering problematic regions, such as regions with high homology or GC rich content, is common and referred to in many studies [18,22,23]. Overall, for each assessed gene, FDRs ranged from 0 to 0.92, the latter being PTEN. Of course, FDRs improved as additional filtering was implemented. Initial B.O.P. specificity, which could only be calculated for BRCA1/BRCA2, was 92.3%. Upon filtering out variants with an allele balance equal to or less than 0.20, specificity was 100%. Ultimately, Sanger sequencing all low confidence calls will eliminate FPs and provide 100% specificity; therefore, it is common to complement NGS gene panels with Sanger sequencing validation in order to consider the test complete and optimize specificity [21,[24][25][26][27][28].
In addition to enabling B.O.P.'s initial analytical assessment, the first B.O.P. screening, which involved 23 AAs and 20 EAs from the AHCC [9], has provided insight regarding variant contributions and ethnic differences. Overall, compared to EAs, AAs had a significantly higher number of individuals with at least one TP (P value 2.71 X 10 −3 ) as well as individuals with multiple TPs (P value 1.95 X 10 −3 ). Of course, comparisons to ethnic-specific controls will determine if these differences contribute to an inherited cancer risk. Interestingly, according to ClinVar [13,14], none of the variants identified in EAs were considered pathogenic/risk variants, whereas 17.4% (4/23) of the AAs had a variant with that classification. The majority of the detected variants were classified as VUSs or benign/likely benign; ultimately, elucidating how VUSs and, even, synonymous variants contribute towards risk is very important. Synonymous variants, though normally ignored and considered benign, can affect splicing, gene expression or translation dynamics, all of which can contribute to a disease phenotype [29]. To further stress their importance, they have been reported to act as driver mutations in human cancers [30] and, through this initial B.O.P. screening, STK11 c.369G>A;p.Q123Q was detected in significantly more AA cases than controls (P value 8.50 X 10 −4 ). Additionally, despite recognizing that hereditary BC risk is polygenic [31], little effort has been put forth to thoroughly investigate all variants in clinically relevant BC susceptibility genes, not to mention variants in genes currently lacking clinical significance, and determine if different variant combinations increase risk. Altogether, seemingly benign variant combinations could, in fact, be pathogenic, and paired with the striking difference between ethnicities regarding the number of cases with multiple variants, further investigation is warranted. Of course, a larger number of cases will be required for this effort.
In summary, this effort assesses the analytical validity of the B.O.P. panel and demonstrates the panel's ability to accurately detect mutations in 10 NCCN clinically actionable genes [8]. Despite the potential biases of the B.O.P. capture and NGS, the high depth of coverage, low FDR, and great sensitivity and specificity strongly support the use of this research gene panel to further elucidate hereditary BC/OvC/PC genetics. Although the cohort for this initial assessment is small, B.O.P. has begun to determine the mutation contributions of clinically valid genes in different ethnicities as well as permit the investigation of VUSs and other variant types and their effect towards polygenic risk. Furthermore, continued B.O.P. screening can provide additional evidence to confirm or refute previously suggested susceptibility genes, lessening the number of genes that lack clinical validity on commercially available panels [1]. Additionally, with the incorporation of candidate genes on B.O.P., it has the potential to identify novel genetic risk factors that are contributing towards BC, OvC, and PC. Lastly, as new susceptibility genes are discovered that are not currently on the B.O.P. panel, it is important to stress the ability to edit the targeted genes in order to best reflect clinical screening. As described herein, there are many potential benefits of B.O.P. screening, and we aim to make advances in cancer genetics research through its implementation in our research efforts.
Supporting information S1