FS, JV, and RL are employees of Roche Molecular Systems, the funder of this study. HJL is a former employee of Roche Molecular Systems. DGdC declares the receipt of honoraria from Roche Molecular Systems (Pleasanton, CA) and Roche Products Ltd (UK). FLR declares the receipt of honoraria from Roche Molecular Systems (Pleasanton, CA) and Roche Diagnostics (Spain). Miller Medical was contracted by Roche Molecular Systems and contributed in the drafting of this manuscript. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.
Conceived and designed the experiments: FLR DGdC HJL FS. Performed the experiments: BA RM BG DM JV EC RL. Analyzed the data: FLR DGdC HJL FS BA RM BG DM EC JV RL. Wrote the paper: FLR DGdC HJL FS BA RM BG DM EC JV RL.
The cobas 4800 BRAF V600 Mutation Test is a CE-marked and FDA-approved in vitro diagnostic assay used to select patients with metastatic melanoma for treatment with the selective BRAF inhibitor vemurafenib. We describe the pre-approval validation of this test in two external laboratories.
Melanoma specimens were tested for
Invalid results were observed in 8/116 specimens (6·9%) with Sanger, 10/116 (8·6%) with ABI BRAF, and 0/232 (0%) with the cobas BRAF test. PPA was 97·7% for V600E mutation for the cobas BRAF test and Sanger, and NPA was 95·3%. For the cobas BRAF test and ABI BRAF, PPA was 71·9% and NPA 83·7%. For 16 cobas BRAF test-negative/ABI BRAF-positive specimens, 454 sequencing detected no codon 600 mutations in 12 and variant codon 600 mutations in four. For eight cobas BRAF test-positive/ABI BRAF-negative specimens, four were V600E and four V600K by 454 sequencing. Detection rates for 5% mutation blends were 100% for the cobas BRAF test, 33% for Sanger, and 21% for the ABI BRAF. Reproducibility of the cobas BRAF test was 111/116 (96%) between the two sites.
It is feasible to evaluate potential companion diagnostic tests in external laboratories simultaneously to the pivotal clinical trial validation. The health authority approved assay had substantially better performance characteristics than the two other methods. The overall success of the cobas BRAF test is a proof of concept for future biomarker development.
The new paradigm of targeted drug development in cancer medicine is to design agents that inhibit specific recurring genetic lesions in tumors. A critical component of this model is the co-development of robust and accurate companion in vitro diagnostic (IVD) assays to detect these specific genetic lesions and thus to identify patients likely to benefit from a given targeted treatment
A successful example of this strategy is the focused and integrated co-development of the novel
Phases of companion diagnostic (CoDx) development in green, drug (Rx) development in blue. IDE = Investigational Device Exemption; IND = Investigational New Drug Application; MAA = Marketing Authorisation Application; NDA = New Drug Application; PMA = Premarket Approval Application; RMS = Roche Molecular Systems, Inc.
This study compared the analytical performance of the CE-IVD marked and FDA-approved RT-PCR test with two other commercially available methods: bidirectional direct Sanger sequencing (“Sanger”) and the Applied Biosystems BRAF Mutation Analysis Reagents kit (“FA test”) for the detection of
The project has been approved by the institutional review board at Grupo Hospital de Madrid.
(“RT-PCR test”, Roche Molecular Systems, Inc., Branchburg, NJ, USA) is an FDA-approved and CE-IVD marked real-time PCR-based assay designed to detect the presence of the
(“FA test”, Applied Biosystems, Foster City, CA, USA) detects and differentiates three mutations in codon V600 of the
(“Sanger”) was performed to detect mutations in exon 15 of the
(GS FLX Titanium, 454 Life Sciences, Branford, CT, USA)
The study was conducted using a blinded panel of FFPE tissue specimens of malignant melanoma as well as artificial DNA blends containing a low percentage of
FFPET = formalin-fixed paraffin-embedded tissue. * Low tumor content (<50%); high levels of necrosis (≥50%); significant pigmentation (<10%); or non-V600E mutations.
Five 5 µm curls were sectioned from each of the 120 panel specimens and blinded. One section was mounted on a slide and stained with hematoxylin and eosin, coded, and reviewed by two pathologists (FL-R and EC). Each specimen was reviewed to confirm the diagnosis of melanoma, and to assess tumor content, degree of pigmentation, and extent of necrosis according to predefined criteria, which was based on laboratory experience in the study of somatic mutations in solid tumors
DNA for the RT-PCR test was isolated from a single 5 µm section per panel member at each site using the cobas DNA Sample Preparation Kit. The DNA eluate was subsequently tested according to the package insert
DNA for each of the other tests (Sanger and FA test) was isolated from a single 5 µm section per panel member using the QIAamp DNA FFPE tissue kit in the automated QIAcube system (Qiagen, Hilden Germany). The DNA eluate was then tested with Sanger according to a standard laboratory protocol or FA test according to the vendor-provided protocol.
Specimen retesting was permitted according to the manufacturer’s or procedure’s instructions as follows:
RT-PCR test: <50% tumor content; insufficient DNA concentration; or invalid initial test result
Sanger: no PCR amplification or difficult sequence interpretation
FA test: fluorescence signal too strong; background noise; extra peaks that did not match any peaks from controls; or small mutation peaks that were difficult to identify as mutation signals.
454 sequencing was performed on all discordant specimens, all invalid specimens, and all specimens for which Sanger sequencing identified a non-V600E mutation.
The number of invalid test results for the 120-member tumor panel was recorded and compared across the three testing methods.
The positive percent agreement (PPA) and negative percent agreement (NPA) of the RT-PCR test for detecting
The reproducibility of the RT-PCR test was evaluated by comparing the results at the two independent clinical laboratory sites for each of the 120 FFPE panel members. Discrepant analysis was performed using 454 on all specimens with discordant results and/or an invalid result.
The extent of pigmentation, necrosis, and tumor content in FFPE samples was graded, according to the following criteria, and their impact on the invalid test rate, mutation call rate, and reproducibility of the RT-PCR test was then assessed.
Tumor content: high (≥50%) versus low (<50%)
Tumor necrosis: high (≥50%) versus low (<50%)
Pigmentation: high (≥10%) versus low (<10%).
DNA blends with 5% V600E alleles (as determined by 454) were prepared from FFPE melanoma specimens. Twenty-four replicate blends (comprising 21 mutant-allele and three wild-type blends) were tested by respective pairs of methods at each site to assess the correct call rate at low percentage mutant alleles.
Assay turnaround time from DNA isolation to results reporting was compared for all methods, assuming one 8-hour shift/day, and the following number of samples: 24 for RT-PCR test and Sanger, and 30 for FA test.
For methods correlation, the two-sided 95% Wilson score confidence intervals were calculated for all measures of agreement
Method 1 | ||||
Positive | Negative | Total | ||
Method 2 | Positive |
|
|
|
Negative |
|
|
|
|
Total |
|
|
|
In the table:
The following statistics will be calculated:
•Overall Percent agreement between Methods =
•Positive Percent Agreement between Methods =
•Negative Percent Agreement between Methods =
95% confidence intervals for the above percent agreements will be calculated using methods described in CLSI EP12-A, User Protocol for Evaluation of Qualitative Test Performance; Approved Guideline, 2002.
False positive (FPR) and false negative (FNR) rates were calculated for both methods using the RT-PCR test as the reference using the formulae FPR = FP/(FP+TN) and FNR = FN/(TP+FN). TN and TP are observed numbers of true negatives and positives, and FP and FN are the observed numbers of the false positives and negatives, respectively.
For invalid rates, exact p-values for the differences between pairs of independent binomial proportions were calculated using commercial software (StatXact® v 9 by Cytel Software Corporation).
For correct call rate, the proportion of positive test results was compared. Minimum sample sizes were calculated that allow for the probabilities (power) 0.8, 0.9 and 0.95 to detect the difference between the proportions of positive test results between testing methods, along with the probability 0.05 of type 1 error of falsely rejecting the hypothesis of equivalence of the proportions. Asymptotic normal approximation of the distribution of the difference between pair of proportions was used. Exact p-values for the differences were calculated using commercial software (StatXact® v 9).
In all calculations, p-value ≤0.05 was considered statistically significant.
Of the 120 FFPE specimens, 116 were included in the method comparison analysis. Four samples were excluded due to insufficient tumor content/melanoma in situ (n = 3) or an invalid result with all three testing methods (n = 1) (
Following initial testing and retesting (when necessary) according to manufacturer’s protocols, final invalid rates of 0%, 8·6%, and 6·9% were obtained for the RT-PCR test, FA test, and Sanger, respectively (
Assay | Initially invalid (n) | Invalid following retesting |
Final invalid rate (%) |
RT-PCR test (n = 232) | 2 |
0 | 0 |
Sanger (n = 116) | 15 | 8 | 6·9 |
FA test (n = 116) | 25 | 10 | 8·6 |
Retesting was permitted according to the manufacturer’s or procedure’s instructions as follows: RT-PCR test: <50% tumour content, insufficient DNA concentration, or invalid initial test result; Sanger: no PCR amplification or difficult sequence interpretation; FA test: fluorescence signal too strong, background noise, extra peaks that did not match any peaks from controls, or small mutation peaks that were difficult to identify as mutation signals.
The same sample was invalid when tested at the two sites.
Of the 116 specimens tested at Site 1 using the RT-PCR test and Sanger, eight were invalid by Sanger, leaving 108 evaluable specimens for comparison. The initial agreement analysis showed PPA of 97·7%, NPA of 95·3%, and overall percent agreement (OPA) of 96·3% (
N = 108 | Sanger | Sanger and 454 | |||||
V600E | Non-V600E/ wild-type | Total | V600E | Non-V600E/ wild-type | Total | ||
|
|
43 | 3 |
46 | 44 | 2 | 46 |
|
1 |
61 | 62 | 0 | 62 | 62 | |
|
44 | 64 | 108 | 44 | 64 | 108 | |
|
97·7% (95% CI 88·2–99·6) | 100% (95% CI 92·0–100) | |||||
|
95·3% (95% CI 87·1–98·4) | 96·9% (95% CI 89·3–99·1) | |||||
|
96·3% (95% CI 90·9–98·6) | 98·1% (95% CI 93·5–99·5) |
CI = confidence interval; MD = mutation detected; MND = mutation not detected.
One sample subsequently confirmed as V600E by 454; Two samples subsequently confirmed as V600K;
Subsequently confirmed as non-V600E/wild-type by 454.
Of the 116 specimens tested at Site 2 using the RT-PCR test and FA test, ten were invalid by FA test, leaving 106 evaluable specimens for comparison. The initial agreement analysis showed PPA of 71·9%, NPA of 83·7%, and OPA of 77·4% (
N = 106 | FA test | FA test and 454 | |||||
V600E | Non-V600E/ wild-type | Total | V600E | Non-V600E/ wild-type | Total | ||
|
|
41 | 8 |
49 | 45 | 4 | 49 |
|
16 |
41 | 57 | 0 | 57 | 57 | |
|
57 | 49 | 106 | 45 | 61 | 106 | |
|
71·9% (95% CI 59·2–81·9) | 100% (95% CI 92·0–100) | |||||
|
83·7% (95% CI 71·0–91·5) | 93·4% (95% CI 84·3–97·4) | |||||
|
77·4% (95% CI 68·5–84·3) | 96·2% (95% CI 90·7–98·5) |
CI = confidence interval; MD = mutation detected; MND = mutation not detected.
Seven samples were reported as wild type by FA test and ‘mutation detected’ by RT-PCR test of which four were subsequently found to be V600E by 454, and three to be V600K. One sample was reported as V600G by FA test and ‘mutation detected’ by the RT-PCR test and was subsequently found to be V600K by 454.
Twelve samples subsequently reported as wild type, three as V600E2, and one as V600R by 454.
Of the remaining eight discordant specimens, seven specimens were reported as ‘wild type’ by FA test and ‘mutation detected’ by the RT-PCR test. Four of these seven were reported as ‘V600E’ and three as ‘V600K’ by 454. One discordant specimen reported as ‘V600G’ (GTG>GGG) by FA test and ‘mutation detected’ by RT-PCR test was reported as ‘V600K’ by 454. Following discrepant resolution with 454 sequencing, the PPA was 100%, NPA was 93·4%, and OPA was 96·2% (
Consistent with the results observed at Site 1, the RT-PCR test gave a ‘mutation detected’ result for all specimens confirmed to have V600E mutations by Sanger or 454, and a ‘mutation not detected’ result for all specimens confirmed to be wild-type or non-V600E by Sanger or 454.
PPA and NPA between RT-PCR test and Sanger are statistically significantly higher than those between RT-PCR test and FA test based on the non-overlapping 83% confidence intervals for the respective percent agreement estimates
Of the 116 specimens evaluable at each site using the RT-PCR test, 95·7% produced concordant results. The agreement for V600E mutation-positive specimens was 100%, and the agreement between the wild-type specimens was also 100% as determined by Sanger or 454. The remaining five discordant specimens between sites were V600K mutation-positive by 454 sequencing.
Pathological assessment of the 116 FFPE specimens revealed varying degrees of pigmentation, necrotic tissue, and tumor content (
Characteristic | Unclassifiable | Low | High |
Pigmentation |
34 | 27 | 55 |
Necrosis |
75 | 23 | 18 |
Tumour content |
NA | 36 | 80 |
NA = Not applicable.
Low = <10%; high = ≥10%.
Low = <50%; high = ≥50%.
Correct call rate at low mutant alleles for V600E was assessed for each method using 24 replicates of a 5% mutant allele DNA blend. The correct call rate for the RT-PCR test was 100% (48/48 samples)
Turnaround time was ∼1 day for the RT-PCR test, ∼5 days for Sanger, and ∼2 days for FA test.
Although the RT-PCR test is CE-IVD marked and is currently the only FDA approved test for the identification of patients with the V600E mutation, a number of assays are widely used in clinical practice
The RT-PCR test detected 100% of V600E mutations and 100% of wild-type specimens in the FFPE panel and had the lowest invalid rate of the three methods. Although pigmentation had an initial effect on one of the FFPE specimens tested, after retesting according to the manufacturer’s recommendations, RT-PCR test result on this specimen was valid, resulting in a 100% valid test rate overall. Obtaining invalid tests results has unfavorable implications for patients, as the need to repeat tests and potentially to re-biopsy the patients can lead to significant delays in patients receiving treatment.
Test reproducibility between different laboratories and different users is another key attribute of an IVD. In a previous study at 3 external sites (2 in the United Stated and 1 in Australia), we had observed an overall reproducibility of 98·8% for the RT-PCR test, but only based on an 8-member panel of melanoma samples
Although designed to detect the V600E mutation, the RT-PCR test also has cross-reactivity with non-V600E mutations. Preclinical studies show that cell lines harboring V600K, V600D, and V600E2 mutations are sensitive to vemurafenib, and limited clinical data suggest that patients with V600K-mutant melanomas may respond to targeted therapy
The differences in the published LOD for the three test methodologies were clearly highlighted when testing artificial tumor blends with 5% mutant
Finally, one practical comment is important as more such companion diagnostic assays are developed. Low levels of reimbursement may influence the choice of methodology (Sanger being the cheapest option and the RT-PCR assay the most expensive assay of the three presented herein). However, it must be emphasized that costs vary tremendously between molecular diagnostic laboratories due to a number of reasons: fully automated DNA extraction
Nevertheless, the cost of a specific test should take into account its performance (i.e., invalid rate, LOD and concordance with the gold standard).
In summary, we have presented a comparison study of three different methods for the detection of V600E mutations in the
We would like to acknowledge Kelli DeMartin, Harry Halait, Shannon Walter, Mari Christensen, Malathy Nagarajan for their contributions to this study and manuscript. We are most grateful to the Sequencing Service at the Spanish National Cancer Centre (CNIO) for performing the direct sequencing assays. Miller Medical was contracted by Roche Molecular Systems and contributed in the drafting of this manuscript.