Clinical correlates of circulating cell-free DNA tumor fraction

Background Oncology applications of cell-free DNA analysis are often limited by the amount of circulating tumor DNA and the fraction of cell-free DNA derived from tumor cells in a blood sample. This circulating tumor fraction varies widely between individuals and cancer types. Clinical factors that influence tumor fraction have not been completely elucidated. Methods and findings Circulating tumor fraction was determined for breast, lung, and colorectal cancer participant samples in the first substudy of the Circulating Cell-free Genome Atlas study (CCGA; NCT02889978; multi-cancer early detection test development) and was related to tumor and patient characteristics. Linear models were created to determine the influence of tumor size combined with mitotic or metabolic activity (as tumor mitotic volume or excessive lesion glycolysis, respectively), histologic type, histologic grade, and lymph node status on tumor fraction. For breast and lung cancer, tumor mitotic volume and excessive lesion glycolysis (primary lesion volume scaled by percentage positive for Ki-67 or PET standardized uptake value minus 1.0, respectively) were the only statistically significant covariates. For colorectal cancer, the surface area of tumors invading beyond the subserosa was the only significant covariate. The models were validated with cases from the second CCGA substudy and show that these clinical correlates of circulating tumor fraction can predict and explain the performance of a multi-cancer early detection test. Conclusions Prognostic clinical variables, including mitotic or metabolic activity and depth of invasion, were identified as correlates of circulating tumor DNA by linear models that relate clinical covariates to tumor fraction. The identified correlates indicate that faster growing tumors have higher tumor fractions. Early cancer detection from assays that analyze cell-free DNA is determined by circulating tumor fraction. Results support that early detection is particularly sensitive for faster growing, aggressive tumors with high mortality, many of which have no available screening today.

Introduction thyroid cancer) [5]. These observations suggest that ctDNA-based cancer screening tests may be more likely to detect lethal cancers, which could help reduce overdiagnosis and unnecessary interventions [24].
To better understand how cTF varies across tumor stages and affects blood-based cancer detection, we developed models of ctDNA shedding using clinical features of tumor biology beyond clinical stage (e.g., tumor size and mitotic or metabolic activity). Using breast, lung, and colorectal cancers, which have the highest incidence and mortality among all cancers in men and women in the United States [25], we identified several key correlates of ctDNA levels and biological tumor properties that can be generalized to other solid cancers.

Analysis overview
The Circulating Cell-free Genome Atlas (CCGA; NCT02889978) study is a prospective, multicenter, observational, case-control study with longitudinal follow-up to support the development of a plasma circulating cfDNA-based multi-cancer early detection test. The CCGA protocol and consent were reviewed and approved by the Institutional Review Board (IRB) or Independent Ethics Committee ( IRBs provide oversight of the study throughout its duration. All participants were consented per regulatory requirements prior to participating in study-related activities and sample collection. In CCGA, blood samples were prospectively collected from participants with newly diagnosed untreated cancer and from participants without a diagnosis of cancer. Samples from the first [26] and second [9] CCGA substudy were used to develop and validate biophysical models (separately for breast, lung, and colorectal cancers) that identify clinical correlates of cTF. Fig 2 depicts the analysis process. Briefly, paired plasma, white blood cells (WBC), and tissue samples from participants in the first CCGA substudy (for which whole genome sequencing [WGS], whole-genome bisulfite sequencing [WGBS], and targeted sequencing data were available) [26] were used to obtain estimated circulating tumor fraction cTF, (described below, Fig 2, step A). Independent of cTF estimation, candidate input clinical variables and covariates were identified for model development if they had been linked to ctDNA levels in previous publications [27][28][29][30], or as motivated by their involvement in the pathophysiological processes of ctDNA generation shown in Fig 1 [14 -23]. Linear models were created that relate cTF to the candidate clinical variables for participants from the first CCGA substudy (Fig 2, step B). Variables were selected for inclusion in a prediction model if they significantly contribute to cTF (Fig 2, step C, insert shown enlarged at the bottom of the figure). These models then predicted cTF for participants in the second CCGA substudy using only selected clinical variables (Fig 2, step D). For validation, model-predicted cTF was compared to tumor detection from a plasma sample using the targeted methylation (TM) assay of the second CCGA substudy (Fig 2, step E). CCGA study design, WGBS, TM panel design and sequencing, sample collection, storage, accessioning and processing, determination of cTF and creation of a WGBS and a TM classifier for cancer detection have been previously presented [5,9,26]. Flow diagram of data analysis. cTF was obtained from plasma, WBC, and tissue assay results (step A) and candidate clinical variables were input for model development (step B). During model development, candidate variables that contributed significantly to cTF were selected as correlates of tumor fraction (step C, insert shown enlarged). Filled and hollow dots depict variables selected or not selected, respectively. cTF was predicted using only selected clinical variables (step D) and was validated by comparison to plasma TM assay results (step E). CCGA: Circulating Cell-free Genome Atlas, cfDNA: cell-free DNA, CRC, colorectal cancer, cTF: Circulating tumor fraction, WBC: White blood cells, TM: Targeted Methylation. https://doi.org/10.1371/journal.pone.0256436.g002 Circulating tumor fraction cTF was determined from blood samples prospectively collected from participants with newly diagnosed untreated cancer from the first CCGA substudy. Genomic variants detected in a targeted sequencing assay of plasma cfDNA from these participants were compared to variants from a WGS assay of matched, macro-dissected formalin-fixed, paraffin-embedded tumor samples while using a white blood cell WGS assay to control for germline variants [5] (Fig 2, step A). For cases with no available tumor tissue, cTF was imputed using scores from a prototype classifier that was trained to detect cancer signals in WGBS data from plasma cfDNA [26]. This WGBS classifier score is computed from abnormally methylated cfDNA fragments in separate genomic regions and has previously been shown to track cTF measurements [5,30] with an approximately sigmoid relationship of WGBS classifier score to log(cTF). Parameters of the sigmoid function and their 95% confidence intervals were estimated separately per tumor type. cTF was imputed by drawing parameters of this fitted curve randomly for each case from their estimated distributions and performing a lookup from classifier score to cTF.
For additional quantitative interpretation, the concentration of cfDNA in plasma was quantified using a High Sensitivity Large Fragment Analysis Kit (DNF-493, Fragment Analyzer, Agilent). The concentration was scaled by tumor fraction and an estimated total volume of whole blood from participant weight and height [31] assuming plasma volume as 55% of whole blood volume. This yielded the total mass of ctDNA in participants' circulation and was scaled to genome equivalents (GE) using 6.5 pg / GE.

Candidate clinical variable selection
Clinical data for this analysis were obtained from CCGA electronic case report forms and cancer-type specific data from pathology and radiology reports. The goal of each biophysical model is to quantitatively explain circulating tumor DNA using only a small subset of routinely available clinical features.
Analyses were limited to clinical stages I, II, and III because ctDNA levels increase strongly in the presence of distant metastases, especially when a highly vascularized organ like the liver is affected [13,32] and because tumor fraction in stage IV cancers is higher [5] and more variable compared to stages I-III, which would significantly complicate the modeling.
Similarly, clinical stage was not selected as a covariate for the purpose of modeling, given that ctDNA levels increase with clinical stage with overlap between stages I, II, III [5,27,29,30] and that the definition of clinical stage depends on the cancer type and takes multiple clinical variables and covariates into account [33] that modeling can test separately.
The selection of candidate clinical correlates and covariates was motivated by the goal to capture an absolute rate of cancer cell deaths taking the total number of tumor cells and their apoptotic and necrotic rates into account. Even when not accounting for cellularity, total tumor volume can serve as a measure for the total number of tumor cells. Total volume computations should further reflect tumor laterality and focality, and the size of each lesion, as all lesions individually contribute ctDNA. Abstracted clinical data provided 1D maximum size per lesion and information on multifocal or bilateral disease. Total volume of all primary lesions was computed assuming spherical lesions. Missing size information was only imputed for non-index tumor lesions when the index lesion size was reported. A ratio of index lesion to non-index lesion size was drawn randomly from the second and third quartile of this ratio for cases with the same cancer type and complete size information. This ratio was multiplied with the reported index lesion size to impute size of non-index lesions.
The presence of tumor-involved lymph nodes was confirmed either by pathology report or clinical node (N)-stage N1, N2, or N3. The absence was confirmed by N-stage N0 and confirmation that all examined lymph nodes in the pathology report were negative. In the absence of pathology report information and clinical N-stage, presence of tumor-involved lymph nodes was imputed from the clinical stage and assumed present for clinical stages III and IV.
Cell death rates(eg., using % positive of the immunohistochemistry marker cleaved caspase 3) are not routinely reported. Available clinical variables for mitotic or metabolic tumor activity were chosen for model creation assuming that tumor cells constantly outgrow tumor resources [18] with hypoxic apoptosis, mitotic catastrophe [34][35][36] or other forms of cell death closely following cell division [21,37].

Stepwise creation of biophysical models
Following selection of the candidate clinical variables, a linear analytical model was created to identify statistically significant (p-value < 0.05) clinical correlates of tumor fraction (Fig 2 Step C). Additionally, their relative importance to explain cTF was determined using R-squared (R 2 ) partitioned by averaging over orders [38,39]. Next, only the clinical variables identified to significantly contribute to cTF were selected as input to a linear prediction model (Fig 2, step C). Finally, a quantitative linear model for each cancer type was created to explain the total number of tumor-derived GEs in the patient body using the same selected clinical variables.
Analyses were performed in the statistical software program R version 3.6.0 with linear model fitting taken from the stats package and relative importance metrics computed with the relaimpo package in version 2.2.3. For validation, receiver operating characteristic (ROC) curves were created and analyzed using the pROC package in version 1.16.2 and the Wilcoxon rank-sum test was taken from the stats package.

Breast cancer model motivation
The impact of different clinical features of breast cancer on cTF has been previously presented [27]. cTF increases with tumor (T) stage, N stage, hormone-receptor (HR) status, and percent of tumor nuclei positive for Ki-67 (%Ki-67), but not with human epidermal growth factor receptor 2 (HER2) status or histologic type. It has been shown for breast cancer that %Ki-67 positive for mitotic rate and % positive of cleaved caspase 3 for apoptotic rate are correlated [40], and %Ki-67 positive is frequently reported in breast cancer.

Breast cancer model implementation
The basis of a biophysical model to predict cTF in breast cancer is the tumor mitotic volume (TMitV), which is the tumor volume multiplied by %Ki-67 positive. In CCGA, %Ki-67 positive was reported for breast cancer from participating sites. Breast cancer tumor tissue submitted for study purposes was additionally sent for Ki-67 staining and read-out to a CAP/CLIA (College of American Pathologists/Clinical Laboratory Improvement Amendments)-certified laboratory for cases from sites that allowed such an additional read-out per protocol.
TMitV is intended to capture ctDNA from all primary tumor foci. More ctDNA is expected from tumor-involved lymph nodes and also from distant metastasis which are not considered here. Tumor-involved lymph nodes widely vary in number, tumor content and location, which is challenging to capture in clinical variables reported in a multi-site study. As a modeling variable, presence of tumor-involved lymph nodes (yes/no) was used due to less complete data for the number of tumor-involved lymph nodes. Histologic type (ductal or lobular carcinoma), tumor grade, and hormone receptor status (positive if a case was positive for progesterone or estrogen receptor overexpression) were tested as additional, independent predictors of cTF in the linear analysis model.

Lung cancer model motivation
Different clinical features of lung cancer that impact cTF have been presented [28,29]. For non-small lung cancer (NSCLC), in univariate analysis the presence of necrosis, lymph node involvement, lymphovascular invasion, tumor size, %Ki-67 positive, and a histologic type other than adenocarcinoma increased cTF. In a multivariable analysis, non-adenocarcinoma subtype, high Ki-67, and lymphovascular invasion were individual predictors for detection of ctDNA in plasma [28]. Furthermore, cTF increases with (18)F fluorodeoxyglucose (FDG) uptake on Positron Emission Tomography-Computed Tomography (PET/CT) for NSCLC [28,29] and small cell lung cancer [11]. While %Ki-67 positive is not routinely reported for lung cancer, FDG PET/CT is often available. The standardized uptake value (SUV) from FDG PET is correlated to tumor growth [41, 42] and %Ki-67 positive in breast [43, 44] and lung cancers [45].

Lung cancer model implementation
FDG PET SUV was obtained by abstraction of PET/CT radiology reports. For lung cancers, prediction of ctDNA from all primary tumor lesions was based on the newly defined excessive lesion glycolysis (ELG), which is the volume integral of FDG PET SUV over all lesions after subtracting 1.0 from the SUV value. Normal tissue is expected to have an SUV close to 1.0 and to not create many cfDNA fragments, while tumor tissue has SUV > 1.0. ELG is closely related to total lesion glycolysis (TLG), which is the volume integral of SUV. Due to data availability in the CCGA study, ELG computation is simplified to volume scaled by a single SUV max -1.0. In previous studies, TLG was shown to correlate with ctDNA [28], but not with overall cfDNA levels [46]. Histologic type (adenocarcinoma, squamous cell carcinoma, small cell carcinoma), presence of tumor-involved lymph nodes, and histologic grade were tested as additional independent predictors of cTF.

Colorectal cancer (CRC) model motivation
Previously, tumor surface area (TSA) and depth of microinvasion have been presented as correlates of cTF for colorectal adenocarcinomas [30]. In contrast to lung, breast, and many other solid cancers, colon cancer size is often measured after resection with the specimen spread from the original curved colon wall onto a flat surface. A surrogate measure for the total number of tumor cells is therefore the tumor surface area of all primary tumor foci together. Markers of cell death or proliferation like cleaved caspase 3, Ki-67, or FDG PET SUV are not frequently reported in CRC. Instead, while fewer cases are available for model creation and validation than for breast and lung cancers, we use CRC to show a different clinical correlate of cTFCTF. A candidate covariate captures how tumor vascularization and trafficking of DNA fragments released by tumor cells affects ctDNA levels. The depth of microinvasion (invaded tissue layers beyond the epithelial lining at the inside of the colon lumen) is routinely reported as it contributes to T stage in CRC [33]. Tumor DNA from apoptotic tumor cells can enter the colon lumen or the circulation [47]. Depth of microinvasion or Tstage therefore inform whether tumor-derived DNA can enter the circulation.

CRC model implementation
The biophysical model for CRC therefore uses the TSA individually scaled by a shedding factor that depends on depth of microinvasion. Levels of ctDNA are again expected to increase with tumor size and the number of tumor cells, even though tumor size does not contribute to clinical staging in CRC. The analysis for CRC was limited to the dominant histologic type adenocarcinoma and only presence of tumor-involved lymph nodes and histologic grade were tested as additional independent correlates of tumor fraction.

Model validation
For model validation, model predictions for cTF were generated for plasma cfDNA samples of participants in the second CCGA substudy (Fig 2, Step D). These samples were subjected to a TM assay with an independently trained machine-learning classifier for cancer detection and prediction of signal of origin for samples with detected cancer signal [9]. The clinical validation of a further refined TM assay and classifiers optimized for screening has recently been completed for a case-control study [48]. Preliminary results from a prospective cohort study evaluating clinical implementation of the MCED test have also been reported [49]. For the validation of the models presented in this paper, cTF predicted by a biophysical model was compared to cancer detected or not detected on an independent patient cohort and assay (Fig 2, Step E) to further relate cTF to early cancer detection test performance. Receiver operating characteristic (ROC) curves were created to test if model-predicted cTF can explain the behavior of the cancer detection test. Separately for each cancer type, a one-sided Wilcoxon test determined if our hypothesis that detected cancers have higher cTF and that cTF can be predicted from few clinical covariates held in the validation cohort.

Results
For breast, lung, and colorectal cancer types, biophysical models were generated using cases from the first CCGA substudy and validated using cases from the second CCGA substudy ( Fig  3A). For breast cancer, the number of cases used to generate and validate the model, respectively, were 221 and 146 (Fig 3B and 3C, left); for lung, the breakdown was 35 and 154 ( Fig 3B  and 3C, center); and for colorectal it was 21 and 51 ( Fig 3B and 3C, right). Fig 3B and 3C also demonstrate, per cancer type, the breakdown of cases by available cTF measurements, available clinical covariates, and stage for model development and validation, respectively.

Breast cancer
A total of 221 breast cancer cases (115 stage I, 77 stage II, 29 stage III) had cTF and sufficient clinical information to develop the biophysical model. 40 cases had cTF imputed from a WGBS classifier score, 17 cases had an imputed size for a non-index lesion, and lymph-node status was derived from clinical stage alone for 5 cases. The distribution of measured and imputed cTF for breast cancer cases were plotted by stage (Fig 4A), and confirm that, while cTF increases with stage in general, the distribution has strong overlap [27]. Fig 4B demonstrates that the WGBS prototype classifier scores [26] increased with tumor fraction, and the fitted sigmoid function used for cTF imputation. Additional cTF estimates (triangles) were imputed for cases with valid WGBS assay results but without a matched tissue sample. cTF distribution is shown for stage IV because these cases were used for imputation of cTF, however, only stage I-III cases were used in subsequent modeling (as described in the Methods section). Table 1 shows the analytical model for breast cancer that determines the clinical variables that have statistically significant correlation with ctDNA. Only TMitV was identified to significantly contribute to cTF with a p-value <0.05, accounting for 45% of explained variability. Lymph node status, histologic grade, and HR-status accounted for 20%, 19% and 13%, respectively, while not statistically significant together with TMitV. Lobular or ductal histology did not appear to have a strong effect on cTF beyond TMitV. The linear prediction model was created using TMitV (Fig 5). 146 breast cancer cases from the CCGA2 substudy with TM results were available for model validation (1/65 detected in stage I, 28/67 detected in stage II, 13/14 detected in stage III). 9 cases had an imputed size for a non-index lesion. The prediction model separated detected and undetected cases beyond clinical stage (Fig 6A, Fig A in S1 Text, Wilcoxon rank-sum p-value <0.001) and can explain breast cancer detection with a TM assay with an area under curve (AUC) of 0.853 (95% CI 0.788-0.919) (Fig 6B).
The corresponding quantitative model estimated that each mm 3 of mitotically active primary tumor volume adds 9.8 genome equivalents to the circulation of a breast cancer patient (p = 5.1x10 -6 ).

Lung cancer
For lung cancer, 35 cases (13 stage I, 5 stage II, 17 stage III) had cTF and sufficient clinical information to generate analysis, prediction, and quantitative models for ctDNA. 13 cases had cTF imputed from a WGBS classifier score and lymph-node status was derived from clinical stage alone for 1 case. Fig 7A shows the distribution of cTF for lung cancer cases for clinical stages I-IV and overlap especially between stages I and II. Fig 7B shows how WGBS prototype classifier scores increase with tumor fraction and again the matched sigmoid function and imputed cTF estimates.
In the analytical model for lung cancer, only ELG was identified to significantly contribute to cTF, accounting for 81% of explained variability ( Table 2). While not statistically significant, histologic grade and presence of tumor-involved lymph nodes accounted for 14% and 3%, respectively. The histologic types adenocarcinoma, squamous cell carcinoma, and small cell lung cancer histology did not affect cTF beyond ELG. Tumor volume and metabolic activity measured by glycolysis accounts for previously published differences between lung cancer types [28,29,50]. A linear prediction model was created using ELG (Fig 8). 154 lung cancer cases from the CCGA2 substudy with TM results were available for model validation (13/59 detected in stage I, 17/26 detected in stage II, 57/69 detected in stage III). 3 cases had an imputed size for a nonindex lesion, and lymph-node status was derived from clinical stage alone for 2 cases. This model separated detected and undetected cases beyond clinical stage (Fig 9A, Fig B in S1 Text, Wilcoxon rank-sum p-value < 0.001) and can explain lung cancer detection with a TM assay with an AUC of 0.784 (95% CI 0.711-0.857) (Fig 9B).
The corresponding quantitative model estimated that each mm 3 of additional FDG SUV glycolysis adds 0.81 genome equivalents to the circulation of a cancer patient (p = 0.0004). Assuming a correlation between %Ki-67 positive and FDG PET SUV, the breast and lung cancer models both predict that the total amount of ctDNA increases with tumor volume and mitotic or metabolic tumor activity. Fig 10A shows the distribution of cTF for colorectal cancer cases for clinical stages I-IV, with overlap especially between stages II and III, while Fig 10B shows that depth of microinvasion separates cases with high and low cTF depending on a deep microinvasion beyond the subserosa or a shallow microinvasion below the subserosa. Fig 10C again

PLOS ONE
Cell-free DNA tumor fraction correlates cTF estimates for this cancer type. 21 colorectal adenocarcinomas (6 stage I, 9 stage II, 6 stage III) had cTF and sufficient clinical information to generate analysis, prediction, and quantitative models for ctDNA. 1 case had cTF imputed from a WGBS classifier score and lymph-node status was derived from clinical stage alone for 1 case.   Table 3 shows the analytical model for colorectal cancer. Only TSA of deeply invading tumors was identified to contribute significantly to cTF, accounting for 75% of explained variability. While not statistically significant, histologic grade and TSA of shallow invading tumors accounted for 12% and 11%, resp.

PLOS ONE
Cell-free DNA tumor fraction correlates Prediction models where cTF increases linearly with TSA were created separately for deep and shallow tumors (Fig 11). 51 colorectal adenocarcinomas from the CCGA2 substudy with TM results were available for model validation (7/19 detected in stage I, 12/18 detected in stage II, 10/14 detected in stage III). 2 cases had an imputed size for a non-index lesion. The predictive model is able to separate detected and undetected cases beyond clinical stage ( Fig  12A, Fig C in S1 Text, Wilcoxon rank-sum p-value < 0.001) and explains lung cancer detection with a TM assay with an AUC of 0.881 (95% CI 0.787-0.975) (Fig 12B).
The corresponding quantitative model estimated that each mm 2 TSA of deeply invading colorectal cancer surface area adds 13.6 genome equivalents to the circulation of a cancer patient (p = 0.0005), while each mm 2 TSA of a shallowly invading tumor adds only 1.8 genome equivalents (p = 0.014). While again ctDNA increases with the number of tumor cells, in colorectal cancer the trafficking of tumor-derived cfDNA either into the circulation or loss of tumor DNA into the colon lumen are a major correlate of ctDNA.

Discussion
This study analyzes the nature and pathophysiological correlates of ctDNA. Prediction models were created using tumor-informed cTF measurements and validated on TM assay results from a different participant population. The ability to transfer the models from a training to a separate validation cohort and between different cfDNA assays used for cTF measurements and an application of early cancer detection indicates that the predictive models might represent the cancer biology underlying ctDNA levels.
The identified surrogate biomarkers that determine ctDNA in this post hoc analysis are tumor size, %Ki-67 positive in breast cancer, PET FDG SUV in lung cancer, and depth of microinvasion in colorectal cancer. These are all established prognostic markers of their respective cancer types and assess tumor aggressiveness [51][52][53]. Here, the biophysical markers TMitV and ELG appeared to capture the essential characteristics of subtypes like HR-negative breast cancer and squamous cell carcinoma of the lung, which have previously been shown to have higher tumor fraction [27][28][29]50]. The main link is an apparent correlation between mitotic activity and cell death rates in growing cancers. As explanation, we offer that tumor cells frequently outgrow the tumor mass currently supported by the TME. While the growth of the tumor mass is limited by angiogenesis and scaffolding by growing stroma, tumor cell division in excess of these resource limitations can lead to many forms of division death and release cfDNA into the TME, from where it can get trafficked into the circulation. The models imply a direct link between cell division and cell death in growing tumors. This link is further supported by the possibility of mitotic catastrophes [34], which explain division death even for p53-deficient cells [35]. Mitotically active tumor cells are observed even in hypoxic tumor regions [54]. These pathophysiological mechanisms that link tumor cell growth, tumor cell death, and shedding of ctDNA contradict recently published assumptions of independent tumor growth and cell death rates which have been used to claim that an indolent tumor might result in higher levels of ctDNA than a fast growing tumor [55,56].
While the identification of clinically reported surrogate biomarkers that correlate with ctDNA levels allows using our models in various applications, the use of surrogate biomarkers is also a study limitation. While the number of tumor cells, the rate of tumor cell death, tumor blood flow and vascular permeability would be physiologically most plausible to quantitatively explain ctDNA, this study aimed to find routinely clinically available markers, identifying tumor size, mitotic or metabolic activity, and depth of invasion as surrogates. We acknowledge that a multi-center observational study like CCGA introduces inter-site variability to clinical data and its completeness for modeling purposes, while a dedicated single-site study might  identify additional clinical correlates of ctDNA from stricter controlled quantitative assessments of tumor characteristics.
The CCGA study enrolled participants prior to cancer treatment, and the developed models identify ctDNA correlates in untreated patients. cfDNA-based applications in cancer treatment settings might additionally represent treatment response and resulting tumor cell death [57][58][59] that are not in scope for the analysis presented here. Furthermore, the developed models are cancer-type-specific and do not yet explain strong variations observed between ctDNA levels for different cancer types [5]. The colorectal cancer model identifies DNA fragment trafficking, (i.e., the transport from the TME into the circulation) as a relevant correlate of ctDNA. Tumor blood flow, perfusion, and vascular permeability are therefore candidates to explain systematic cancer-type-specific variations in ctDNA that have not been assessed in this study. Increased levels of ctDNA for growing tumors can be further explained by tumor perfusion and the dual nature of the vascular epithelial growth factor (VEGF) that is expressed in tumors to drive neovascularization. VEGF has previously been known as vascular permeability factor VPF [60], and increased vascular permeability with increased interstitial filtration flow can contribute to increased trafficking of cfDNA from the tumor mass into the circulation via lymphatic drainage or direct intravasation on the venous side of the vascular perfusion bed. Future work could include relating direct measures of cell death (e.g., using immunohistochemistry assessment of cleaved caspase 3), tumor blood flow, and vascular permeability to ctDNA.
The biophysical models presented here explain levels of ctDNA using only few, physiologically plausible clinical parameters. The identified clinical correlates of ctDNA are at the same time biomarkers of tumor aggressiveness, suggesting that early cancer detection from cfDNAbased applications is determined by tumor growth, i.e., the total number of new tumor cells created in a patient body, instead of tumor size (currently used to characterize image-based screening methods like mammography or low-dose CT [61][62][63]). This difference in sensitivity either to static tumor size or to tumor growth makes it difficult to compare a cfDNA-based MCED test to existing screening methods, especially as it is meant to complement existing screening methodologies such as mammography. For example, the MCED test used for the validation of the biophysical models in this paper [9,48] did not detect all early-stage breast cancers that were detected by mammography, but preferably detected more aggressive subtypes [27]. The results in this paper show that cancer detection in cfDNA depends on cTF and TMitV and can be used to explain this behavior of a cfDNA-based MCED test. Together these findings suggest that a cfDNA-based MCED test can complement mammography given that mammography does not detect all fast-growing interval cancers [64,65] or cancers in women with dense breasts [66,67]. A recent post-analysis of the second CCGA substudy showed that cancers not detected by a cfDNA-based MCED test had better prognosis than cancers detected by the test, suggesting that the test detected more aggressive cancers [68].
Another systematic difference between imaging-and cfDNA-based cancer early detection is specificity. While consecutive single-cancer screening methods (possibly with comparatively higher sensitivity especially for less aggressive cancers) accumulate false-positive rates [69], one cfDNA MCED test detects all covered cancer types with one single, low false-positive rate. Consequently, it seems more appropriate to characterize a cfDNA-based MCED test not in competition with, but instead as a complement to, imaging-based, single-cancer screening methods. Large studies in a target screening population to assess outcome have recently been announced, and a potential positive impact on patient survival has recently been published using modeled data with the cancer-type and stage-dependent sensitivity of the same test that is used to capture cancer detection in this manuscript [70].
In conclusion, this study supports that tumor fraction plays a pivotal role for cfDNA applications in oncology. It drives the performance of MCED tests, its clinical correlates are indicators of aggressive tumors, and it is ultimately prognostic and identifies potentially lethal cancers [5,68,71]. Taken together, these data support that early cancer detection from cfDNA is particularly sensitive to aggressive, fast-growing tumors.