An approach for developing a blood-based screening panel for lung cancer based on clonal hematopoietic mutations

Ramu Anandakrishnan; Ryan Shahidi; Andrew Dai; Veneeth Antony; Ian J. Zyvoloski

doi:10.1371/journal.pone.0307232

Abstract

Early detection can significantly reduce mortality due to lung cancer. Presented here is an approach for developing a blood-based screening panel based on clonal hematopoietic mutations. Animal model studies suggest that clonal hematopoietic mutations in tumor infiltrating immune cells can modulate cancer progression, representing potential predictive biomarkers. The goal of this study was to determine if the clonal expansion of these mutations in blood samples could predict the occurrence of lung cancer. A set of 98 potentially pathogenic clonal hematopoietic mutations in tumor infiltrating immune cells were identified using sequencing data from lung cancer samples. These mutations were used as predictors to develop a logistic regression machine learning model. The model was tested on sequencing data from a separate set of 578 lung cancer and 545 non-cancer samples from 18 different cohorts. The logistic regression model correctly classified lung cancer and non-cancer blood samples with 94.12% sensitivity (95% Confidence Interval: 92.20–96.04%) and 85.96% specificity (95% Confidence Interval: 82.98–88.95%). Our results suggest that it may be possible to develop an accurate blood-based lung cancer screening panel using this approach. Unlike most other “liquid biopsies” currently under development, the approach presented here is based on standard sequencing protocols and uses a relatively small number of rationally selected mutations as predictors.

Citation: Anandakrishnan R, Shahidi R, Dai A, Antony V, Zyvoloski IJ (2024) An approach for developing a blood-based screening panel for lung cancer based on clonal hematopoietic mutations. PLoS ONE 19(8): e0307232. https://doi.org/10.1371/journal.pone.0307232

Editor: Francesco Bertolini, European Institute of Oncology, ITALY

Received: May 1, 2024; Accepted: July 1, 2024; Published: August 22, 2024

Copyright: © 2024 Anandakrishnan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The datasets used in the current study are available in the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/) and the Sequencing Read Archive (https://www.ncbi.nlm.nih.gov/sra) repositories. The accession numbers or identifiers for the datasets are listed in SI Table S1. Source code is available at https://sourceforge.net/projects/lung-cancer-screening-panel/.

Funding: This work was funded by the VCOM 2023 REAP grant #1038624(RA). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Cancer is a leading cause of death in the US, second only to heart disease, with lung cancer accounting for an estimated 21% of these deaths [1]. Early detection of lung cancer has been shown to reduce mortality by 20% [2]. However, the recommended screening for lung cancer, low dose computed tomography [3], is restricted to high-risk individuals, with an estimated uptake of only 17.54% even within this group due to financial and other barriers [4]. Clearly, there is a need for more accessible lung cancer screening protocols. Several blood-based screening panels (“liquid biopsies”) are currently in various stages of validation, though none have yet achieved clinical utility [5], other than as companion diagnostics [6]. These include screens for antibodies [7], circulating microRNA [8], and cell-free DNA fragments [9]. For example, Galleri, a multi-cancer blood test based on the methylation state of cell-free DNA, is currently in clinical trials even though it had a poor 51.5% sensitivity (95% confidence interval (CI): 49.6–53.3%), but a high 99.5% specificity (95% CI: 99.0–99.8%) in a clinical validation study [10].

Here we present an approach for developing a blood-based lung cancer screening panel based on a set of clonal hematopoietic (CH) mutations. Clonal hematopoiesis is the clonal expansion of a single hematopoietic stem or progenitor cell (HSPC) due to specific mutations in these cells. While mutations in tumor cells that inhibit anti-tumor immune function have been extensively studied [11, 12], the potential impact of immune cell mutations on anti-tumor activity remains relatively unexplored [13]. Recent studies have shown that CH mutations can lead to the clonal expansion of potentially pathogenic mutations in a relatively large population of tumor infiltrating immune (TII) cells, affecting their anti-tumor activity. A loss-of-function TET2 mutation in tumor infiltrating myeloid cells was shown to increase angiogenesis in a lung cancer animal model, exacerbating tumor progression [14]. An ASXL1 mutation in tumor infiltrating T-cells was shown to perturb their development and function, promoting tumor growth in syngeneic animal models [15]. On the other hand, TET2 inactivation in tumor infiltrating lymphocytes and myeloid cells has also been shown to inhibit tumor growth [16, 17], indicating a dependence on other factors. Studies have also shown that mutations affecting the expression of specific genes in immune cells can modulate their anti-tumor activity [18, 19]. Although these mutations do not directly “drive” tumor growth, they may modulate it by inhibiting or enhancing anti-tumor immune response.

Clonal hematopoiesis occurs in over 10% of adults over 70 years of age [20] and has been implicated in cardiovascular and hematologic disorders, including leukemias and lymphomas [21, 22]. Over 20 different mutations have been associated with clonal hematopoiesis, with mutations in DNMT3A, TET2 and ASXL1 accounting for over 90% of age-related cases [23]. Although the associated mechanism is poorly understood, studies suggest three possible mechanisms: increased self-renewal, increased number of self-renewal cycles to become a committed progenitor, and/or increased epigenetic or transcriptional heterogeneity leading to a highly proliferative state [24–27]. However, clonal hematopoiesis by itself may not affect immune function. Instead, secondary mutations in clonally expanding cells may be required for immune dysfunction [23, 28, 29], which could affect anti-tumor immune function.

We hypothesize that the presence of pathogenic CH mutations in TII cells could impact anti-tumor immune function and therefore, could serve as predictive markers for cancer risk. In the following, we develop a machine learning model to predict the probability of cancer based on the detection of these mutations in blood samples.

Materials and methods

Exome sequencing data and variant calling

For data from the Sequence Read Archive (SRA) [30], fastq files from the repository were aligned to the GRCh38 reference genome using the Burroughs-Wheeler aligner (BWA) v0.7.15 [31]. For data from the Genomic Data Commons (GDC), we downloaded aligned read (bam) files which were already aligned to the GRCh38 reference genome using BWA. We used the same aligner (BWA) and reference genome (GRCh38) for SRA, as was used for the GDC data to produce the bam files. Variants were then called using GATK Mutect2 v3.7.0–1 [32]. The same read alignment and variant calling protocols were used for all datasets. The Clara parabricks v4.0.0–1 [33, 34] implementation of these programs was used for parallel execution on NVIDIA GPUs. Default values were used for all parameters, including minimum variant calling accuracy (Log Odds > 3.0). To identify potentially pathogenic mutations only, we also downloaded Mutect2 mutation annotation format (MAF) files from the GDC, which include variant allele fraction (VAF) values for matched tumor and blood samples. The MAF files were not used to train or test the ML model.

Single-cell RNA sequencing (scRNA-seq) data and cell type analysis

Cellranger v6.0.2 [35] was used to identify genes expressed in each cell, from scRNA-seq fastq files. SCSA v1.1 [36] was used to identify cell types from k-means clustering of cells based on the genes expressed by each cell. Default values were used for all parameters.

Training, validation, and test sample sets

Ten of 18 datasets were randomly split into Training, Validation, and Test sets using the train_test_split() module in the Python scikit learn v1.2.2 library [37]. The other eight datasets were used as the independent test set.

Machine learning (ML) algorithms

The ML algorithms implemented in the scikit_learn v1.2.2 library were used to build the ML models [37]. See S3 Table in S1 File for hyperparameter values.

Statistical analysis

The binomial proportion confidence interval (CI) with a normal approximation was used to calculate CI for accuracy, using the proportion_confint() module in the Python statsmodels v0.13.15 library [38]. Standard errors for Logistic Regression (LR) coefficients were estimated using the bootstrap method, using 1000 iterations of sampling with replacement [39]. The estimated standard error was used to calculate Z-statistic, CI, and p-value for the LR coefficients.

Results

In this study we first identified a set of 98 potentially pathogenic (non-passenger) CH mutations in TII cells in lung cancer tissue samples, following the approach in Ref [40] and detailed below. The variant allele fraction (VAF) for these mutations in blood samples was then used as the feature set (predictors) for developing a machine learning model for differentiating between lung cancer and non-cancer cases. A logistic regression model based on these mutations in blood samples was able to differentiate between a test set of lung cancer and non-cancer samples with 90.26% accuracy (95% CI: 88.50–92.01%).

Potentially pathogenic CH mutations in TII cells

Our approach for identifying potentially pathogenic CH mutations in TII cells consisted of three stages (Fig 1a). First, we identified protein altering variants in the TCGA dataset for lung adenocarcinoma (LUAD) [41]. These variants consisted of missense, nonsense, indel, and splice site mutations (Fig 1b). Other mutations, such as intergenic, intronic and synonymous mutations are less likely to be pathogenic and were therefore excluded. Of the 2.5 million distinct variants in the 569 TCGA-LUAD samples, 428,015 were protein-altering (Fig 1a).

Download:

Fig 1. Identifying potentially pathogenic clonal hematopoietic (CH) mutations in tumor infiltrating immune (TII) cells.

(a) Mutations were limited to protein-altering variants that were clonally expanded in TII cells and were considered to be potentially pathogenic non-passenger mutations. (b) Protein altering mutations excluded intergenic, synonymous, and intronic variants. (c)-(e) Example of criteria used to select the S1506L variant in VWF. (c) CH mutations in TII were identified by variant allele fraction (VAF) in tumor and normal blood samples. (d) One of the four criteria for identifying potentially pathogenic mutation was that the mutation occurred in > 5% of tumor samples, but rarely (< 0.01%) in the genome aggregation database (genomAD). (e) Another criterion used to identify potentially pathogenic mutations in TII was that the gene was expressed in tumor infiltrating immune cells based on lung cancer single-cell RNA sequencing data.

https://doi.org/10.1371/journal.pone.0307232.g001

From the above set of protein-altering mutations, we selected CH mutations in TII cells based on variant allele fraction (VAF) in matched tumor and normal blood samples. Clonal hematopoiesis is conventionally defined as somatic mutations with a VAF > 2% [20]. This lower limit excludes mutations in circulating tumor cells or cell-free DNA which have a VAF < 1% [42, 43]. To exclude potential germline mutations, we selected variants with a VAF < 25%. To identify mutations in TII cells, we further selected variants that occurred in both tumor and matched blood samples. It is highly improbable that the same somatic variant would originate in tumor and blood cells simultaneously. It is more likely that these mutations originated in blood cells and subsequently infiltrated the tumor microenvironment, leading to their detection in tumor infiltrating blood cells. For example, the S1506L mutation in the gene for von Willebrand factor (VWF) occurs with a median VAF of 7.23% in TCGA-LUAD tumor samples containing the mutation, and in 4.90% of matched blood samples (Fig 1c). Of the 428,015 protein-altering variants, 36,293 were CH mutations in TII cells (Fig 1a).

From the above set of CH mutations in TII cells, we selected potentially pathogenic mutations based on the following three criteria. First, the mutation occurred frequently in tumor samples (> 5% of TCGA-LUAD samples), but rarely in large population-based studies (< 0.01% of sequences in the genome aggregation database (gnomAD)), suggesting that the genomic location is highly conserved in the germline, but somatic mutations may be correlated with tumor progression. For example, the S1506L variant in VWF occurred in 6.85% of the 569 TCGA-LUAD samples (Fig 1d), whereas it was not detected in any of the gnomAD germline sequences (S2 Table in S1 File). Second, the gene was expressed in TII cells based on the analysis of single-cell RNA-sequencing data (Methods) for 87,743 cells from 21 lung tumor samples [44–46]. Genes not expressed in TII cells may not be functional in this context and were therefore excluded. For example, VWF was expressed in lung tumor infiltrating B, T, NKT, and monocyte/macrophage cells (Fig 1e). Lastly, the mutation was predicted to be damaging by two mutation-significance prediction tools–SIFT [47] and PolyPhen2 [48] (S2 Table in S1 File). Of the 36,293 protein-altering CH mutations in TII cells, 98 were potentially pathogenic mutations based on the above criteria (Fig 1a). The 98 potentially pathogenic CH mutations in TII cells are listed in S2 Table in S1 File, along with values for the criteria used to identify them. Although the selected mutations are unlikely to be passenger mutations, they are also not “driver” mutations, in the sense that they do not directly cause tumor cell growth. However, by potentially affecting immune cell function, they may modulate tumor growth and may therefore represent potential predictors of cancer risk.

Machine learning (ML) study design

To develop an ML model with realistic prediction potential using available sequencing data, a key consideration was mitigating potential confounding or batch effects. In general, large-scale cancer DNA sequencing studies do not include non-cancer controls [41], and large-scale non-cancer studies do not include cancer cases [49]. Although we used the same protocol for aligning sequencing reads to the reference genome and for variant calling, these studies may use different protocols for cohort selection and sequencing. Therefore, ML models may classify cases based on differences in mutations detected that are artifacts of these protocol differences rather than characteristics related to the presence of cancer. To mitigate such confounding (batch) effects, we used data from multiple different studies. We also evaluated alternative models using the average batch accuracy to avoid bias toward protocols used by the larger studies. For example, two of the non-cancer cohorts, prjna532465 and prjna790003, had average ages of 82 and 77 years, respectively, significantly higher than the 63 years for the lung cancer samples (Table 1). However, by including the non-cancer cohorts, prjna421434 and prjna342304, with average ages of 42 and 39 years, respectively, the overall average age for the non-cancer samples was 64 years, comparable to the average age for lung cancer samples, to the extent the data is available. As noted in the Discussion section below, these strategies can mitigate the confounding effects but cannot fully eliminate them.

Download:

Table 1. Datasets used for model training, validation, and testing.

https://doi.org/10.1371/journal.pone.0307232.t001

A set of whole exome sequencing data for blood samples from 18 different cohorts were identified for this study (Fig 2, Table 1). These included a total of 1,992 lung cancer and non-cancer blood samples. On average, for the cohorts for which demographic data was available, the age and gender composition for the lung cancer and non-cancer cohorts were comparable. The average age and gender composition were 63 years and 56% male for the lung cancer cases, respectively, and 60 years and 51% male for the non-cancer cases (Table 1). Although the racial compositions were significantly different with 71% white in the lung cancer cohorts and 42% white in the non-cancer cohorts, racial information was not available for nine of the 18 cohorts and two of the non-cancer cohorts targeted Asian populations. Although the percentage of current or past smokers in lung cancer cases (63%) was comparable to non-cancer cases (66%), smoking status was only available for two of the nine non-cancer cohorts (Table 1). The proportion of samples by stage also varied by cohort, with batch averages of 24.18% (0–54.35%) for stage I, 21.91% (0–40.74%) for stage II, 21.32% (0–64.71) for stage III, 20.60% (0–100%) for stage IV, and 11.93% (0–46.15%) for pre-cancerous cases (S1 Table in S1 File). Five of the nine lung cancer cohorts were all treatment naïve and two of the cohorts were cases with previous treatments (S1 Table in S1 File). Methods for accounting for batch-effects is an active area of research [50], however it is unclear how effective these methods would be for somatic mutation data, or if they would result in the loss of critical information.

Download:

Fig 2. Machine learning study design.

Sequencing data from multiple different cohorts were used to mitigate batch-effect artifacts due to differences in cohort selection and sequencing protocols. Training and Validation set: 75% of the sequencing data from 10 of 18 different cohorts were used to train the machine learning models and identify a model that had the highest average batch accuracy. Test set: All data for 8 of the 18 cohorts, in addition to 25% of the data from the other 10 cohorts.

https://doi.org/10.1371/journal.pone.0307232.g002

All data from eight of the 18 cohorts were set aside as an Independent Test set, representing a total of 830 samples (Fig 2). Since there is no single large cohort containing both lung cancer and non-cancer sequencing data, the independent test set represents the best possible test of the predictive potential of the model developed below, short of a blinded clinical case-control study where we can proactively control all possible confounding variables. In addition, 25% of the data (293 samples) from each of the other 10 cohorts were also set aside for testing (Fig 2, Table 1). None of the samples in the Test set were used to develop the model. The remaining 75% of the data (869 samples) were further subdivided into a Training set (648 samples) and a Validation set (221 samples). Studies have shown that this sample size can be sufficient for >90% concordance in prediction accuracy, indicating low risk of overfitting [68–70]. The Training and Validation data sets were used to evaluate six commonly used ML algorithms: Light Gradient Boosted Trees, Random Forest, Support Vector Machine, Neural Network, k Nearest Neighbors, and Logistic Regression classifiers. Combinations of model hyperparameters were evaluated for each of these algorithms using the Training and Validation sets. In addition to model hyperparameters, we also evaluated the effect of dichotomizing the VAF based on cutoff values ranging from 0.00 to 0.30. Each of the resulting 135,811 models was evaluated using the Validation set (Fig 2). Average unweighted batch accuracy was then used to identify the best model for each of the six ML algorithms. Average batch accuracies for all six models were comparable, ranging from 91.93% (66.67–100.00%) for Logistic Regression to 96.66% (83.33–100.00%) for Light Gradient Boosted Trees (S3 Table in S1 File, Fig 3a).

Download:

Fig 3. Selecting an optimal machine learning classifier.

(a) Batch accuracy for the different machine learning algorithms, with model hyperparameters that produced the highest average batch accuracy for the validation set. Error bars show the range of values, orange lines the median and green dashed lines the mean. (b) A Logistic Regression (LR) model was selected for its interpretability of feature coefficients. Error bars show 95% confidence intervals. (c) Receiver Operating Characteristic (ROC) curve for the LR model showing 0.9472 Area Under the Curve (AUC). The red line represents a random prediction. (d) 91.66% of lung cancer and 83.20% non-cancer samples had predicted probabilities of >0.9 and <0.1 for cancer, respectively. Predicted probability of >0.5 (red line) predicts cancer and <0.5 no cancer.

https://doi.org/10.1371/journal.pone.0307232.g003

Logistic regression (LR) model

Since average batch accuracies for all six ML models were comparable (91.93–96.66%), we selected the LR model for the direct real-world interpretability of its parameters. Specifically, the log odds-ratio (LOR) of model predictions and feature weights. The LR model can be formulated as (1) where p is the predicted probability of cancer, is the odds-ratio for cancer prediction, β₀ is the intercept, β_i are the weights for each of the features (predictors) x_i, and n is the number of features. x_i = 1 or 0 corresponding to the presence or absence of the mutation with VAF > 0.01. All further testing was performed using the LR model. The weights β_i for each of the predictors (mutations) can be interpreted as the LOR for the probability of cancer when mutation = 1 versus 0, controlling for all other predictors.

For the selected LR model, VAF was dichotomized as mutation = 1 for VAF>0.01 and 0 otherwise, class_weight = {0:1, 1:4}, solver = ‘saga’, and default values for all other hyperparameters (S3 Table in S1 File). The accuracy of the LR predictions for each of the ten datasets in the Validation set ranged from a mean of 66.66% (95% CI: 13.32–100%) to 100% (Fig 3b). The overall accuracy for the combined Validation set was 94.57% (95% CI: 91.58–97.56%). For the Validation set, the receiver operating characteristic (ROC) curve had an area under the curve (AUC) of 0.9472, suggesting strong diagnostic ability (Fig 3c). With a cut-point of 0.5, sensitivity and specificity were 97.50% (95% CI: 94.71–100.00%) and 89.62% (95% CI: 83.82–95.43%), respectively. More importantly, 91.66% of the cancer cases were unambiguously predicted with probability of cancer > 0.90 and 83.20% of the non-cancer cases were predicted with probability of cancer < 0.10 (Fig 3d).

The LR model was refit to the combined Training and Validation set, to incorporate all available non-test data. The resulting model was then used for testing. For 87 of the 98 mutations the LOR β_i ≠ 0 (Fig 4b), with the other 11 mutations (β_i = 0) not contributing to the LR model prediction. 47 of 87 mutations had β_i > 0, suggesting that these mutations may increase the risk of cancer, while 40 mutations had β_i < 0, suggesting a reduced risk, controlling for all other mutations. For nine of the 40 mutations with β_i < 0, the mutation occurred more frequently in non-cancer samples than in lung cancer samples. For 23 of the 47 mutations with β_i > 0 were found to occur more frequently in lung cancer samples (Fig 4a, S4 Table in S1 File). The LOR for 26 of the 98 mutations had p-value < 0.05 (Fig 4b, S4 Table in S1 File). Although the other 78 mutations individually have poor predictive power (p-value > 0.05), their combined effect can be highly predictive.

Download:

Fig 4.

Mutations with p-value<0.05 for log odds ratio βi (a) Fraction of lung cancer and non-cancer samples with variant allele fraction (VAF) > 0.01. (b) Log odds ratio and 95% confidence interval.

https://doi.org/10.1371/journal.pone.0307232.g004

Test results

The Test set consisted of 25% of the data set aside from each of the ten studies used for Training and Validation (127 lung cancer and 166 non-cancer blood samples), and all the data from eight additional studies (451 lung cancer and 379 non-cancer blood samples) (Fig 2, Table 1). The average batch accuracy across all 18 datasets for the LR model developed above was 85.07% (27.27–100.00%) (Fig 5a). One of the 18 datasets, prjeb47088, had a very low accuracy, 27.27% (95% CI: 8.66–45.88%), due to a low read depth of 36 reads per exome loci containing the 98 mutations, compared to the average read depth of 197 across all cohorts (Table 1). With such a low read depth, mutations with low VAF (<5%) cannot be reliably detected [71], which can affect predictive accuracy. This finding highlights the potential confounding (batch) effect associated with sequencing depth. However, the relatively small number of samples (22) in this dataset, accounting for only 1.96% of the 1,123 test samples (Table 1), did not significantly affect the overall accuracy of the model. The overall accuracy for the combined Test set was 90.25% (95% CI: 88.50–92.01%) (Fig 5a), comparable to the 94.57% accuracy for the combined Validation set (Fig 3b). In addition, the overall accuracy for the eight independent test cohorts alone was 89.88% (95% CI: 87.83–91.93%), excluding the 10 datasets where 25% of the data was set aside for testing. This was comparable to the 91.42% accuracy (95% CI: 88.06–94.77%) for the test data from other ten cohorts where 25% of the data was set aside for testing, suggesting that there was little if any confounding (batch) effect. The ROC curve had an AUC of 0.9036, suggesting strong diagnostic ability (Fig 5b). With a cut-point of 0.5, the sensitivity and specificity were 94.12% (95% CI: 92.20–96.04%) and 85.96% (95% CI: 82.98–88.95%). More importantly, 90.48% of the cancer cases were unambiguously predicted with probability of cancer > 0.90 and 74.86% of the non-cancer cases with probability of cancer < 0.10 (Fig 5c).

Download:

Fig 5. Predictive accuracy for test datasets.

(a) Batch and Combined accuracy for the LR model. Error bars show 95% CI. (b) ROC curve for the LR model showing 0.9036 AUC. The red line represents a random prediction. (d) 90.48% of lung cancer and 74.86% non-cancer samples had predicted probabilities of >0.9 and <0.1 for cancer, respectively. Predicted probability of >0.5 (red line) predicts cancer and <0.5 no cancer.

https://doi.org/10.1371/journal.pone.0307232.g005

The contribution of each of the 98 mutations to the LOR of predicted probability of cancer for each sample is either 0 or βi (Eq 1). This contribution varies considerably from sample to sample (S1 Fig in S1 File). For some mutations the contribution is clear. The S177A mutation in SEC61A2 with the most negative βi = -10.45 occurred in 55.96% of non-cancer samples but in only 5.61% of lung cancer samples (S1 Fig in S1 File). On the other hand, the T165P mutation in PRAMEF20 with the largest βi = +5.95 is predictive of cancer, controlling for all other mutations. However, counter-intuitively, the mutation occurred in 57.43% of non-cancer samples and in only 28.05% of lung cancer samples. These results highlight the complex combination of weights and mutations in the LR model contributing to the predicted probability of cancer.

Discussion

Differences in cohort selection and sequencing protocols, between lung cancer and non-cancer samples, can result in different variant profiles between the two sample sets. As a result, the Training and Validation stages may show artificially high accuracy but fail to accurately predict the probability on other independent datasets where the protocols may differ. To mitigate this confounding or batch effect we used datasets from ten different studies for training and validation, and eighteen different studies for testing. In addition, we used the average batch accuracy for the training and validation metric to mitigate the bias toward protocols for batches with larger number of samples. The comparable overall accuracy for the eight independent test cohorts alone (89.88%), excluding the datasets where 25% of the data was set aside for testing, and for the test data from other ten cohorts where 25% of the data was set aside for testing (91.42%), suggests that there was little if any confounding (batch) effect. Despite high test accuracy (Fig 5), these mitigation steps may not fully eliminate the batch-effect and the LR model developed here may not represent an optimal blood-based lung cancer screening panel. However, the LR model can serve as an effective starting point for developing an accurate lung cancer screening panel. To do so, a case-control study using a standardized protocol could be used to validate and optimize the LR model. The case-control study should also be more racially balanced than the studies used here, so that the screening panel is not racially biased.

The 98 potentially pathogenic mutations were selected because the associated gene sequences were highly conserved, the mutations were predicted to be damaging to gene function, and they could affect immune cell activity. Therefore, we expected most of the mutations to inhibit an effective anti-tumor immune response thus promoting tumor growth. However, many (48) of the 98 mutations had LOR < 0, suggesting that these mutations may be predictive of non-cancer cases. Chronic inflammation, triggered by tobacco smoke and other carcinogens, has been shown to promote tumorigenesis [72, 73]. We speculate that these mutations may inhibit such chronic inflammatory responses possibly inhibiting tumorigenesis.

It is important to note that the presence of some CH mutations in peripheral blood may be due to the presence of cancer. Cancer is known to alter the peripheral blood immune cell composition, by triggering the expansion of specific immune cell subtypes [74]. These immune cell subtypes may harbor different mutations, which would then be more abundant in the altered peripheral blood due to clonal hematopoiesis (CH). The CH mutations identified by our algorithm may indeed represent the result of CH expansion of specific mutations in specific immune cell subtypes, rather than the general presence of these mutations in hematopoietic stem and progenitor cells. Further investigation will be required to clarify the relationship.

In addition to providing a starting point for developing a cancer screening panel, this study identified a set of potential treatment targets. The mutations selected as predictors for the LR model were based on their potentially pathogenic role in TII cells. The selection criteria are detailed in the Results section under “Potentially pathogenic CH mutations in TII cells”. The values for the selection criteria for each mutation are listed in S2 Table in S1 File. Although the selection criteria used suggest the possibility that these mutations are pathogenic, we do not claim that that is necessarily so. While these mutations may not directly cause tumor growth, they may modulate it. In particular, the set of mutations with a p-value < 0.05 for the LOR (Fig 4), should be considered for further investigation. For example, the P453Q variant in MRC1 had a LOR of -7.47 (95% CI: -2.52–-10.44), suggesting an inhibitory effect against lung cancer. MRC1 codes for a membrane receptor protein that mediates macrophage endocytosis of glycoproteins [75]. MRC1 expression has been associated with inflammatory macrophage phenotype [76]. We speculate that the mutation inhibits the expression of MRC1, possibly reducing chronic inflammation and inhibiting tumorigenesis. In contrast, the T170P variant in RXRA had a LOR of +4.60 (95% CI: 2.69–6.52), suggesting a facilitative effect. RXRA encodes a nuclear receptor that acts as a transcription factor promoting target gene expression. RXRA is known to inhibit non-small cell lung cancer cell growth [77] and is a therapeutic target for lung cancer [78]. The 26 mutations with significant LOR (p-value < 0.05, Fig 4, S4 Table in S1 File) may represent potential immunotherapy targets for lung cancer.

The lung cancer and non-cancer samples included in this study did not include samples from other cancer types. Therefore, a lung cancer screening panel based on the LR model may detect other cancer types as well. In fact, 36 of the 98 potentially pathogenic CH mutations in TII cells for lung cancer overlap with those for breast cancer identified in a previous study [40]. These mutations could result in a positive prediction in the presence of breast cancer. The approach presented here should be extended to develop a pan-cancer screening panel to differentiate between different cancer types.

Conclusions

Early detection can significantly reduce the mortality rate due to lung cancer, yet the uptake of the currently approved low-dose computed tomography scan for lung cancer is limited. Here we present an approach for developing a blood-based screening panel that utilizes a set of potentially pathogenic clonal hematopoietic (CH) mutations detected in tumor infiltrating immune (TII) cells. While the effect of CH mutations in TII cells on solid tumor progression remains relatively unexplored, recent animal model studies suggest a potential role. We developed a logistic regression (LR) machine learning model for predicting the probability of lung cancer based on a set of 98 CH mutations in blood samples. The LR model demonstrated a high accuracy of 90.25% (95% CI: 88.50–92.01%) in a separate Test set. To mitigate batch-effects arising from differences in cohort selection and sequencing protocols, we used sequencing data from 18 different studies, though this may not fully eliminate the batch-effect. A case-control study with standardized protocols could be used to validate and refine the LR model and develop an accurate blood-based lung cancer screening panel.

Supporting information

S1 File. The following tables and figures are included in the supporting information file.

S1 Table. Cohort characteristics. S2 Table. Potentially pathogenic CH mutations in TII. S3 Table. Best ML model hyperparameters and batch accuracy. S4 Table. LR model coefficients. S1 Fig. Average contribution of individual mutations to the prediction of cancer probability for cancer and control samples.

https://doi.org/10.1371/journal.pone.0307232.s001

(ZIP)

Acknowledgments

We thank Drs. Robin T. Varghese and Harold Garner, VCOM, for their comments and suggestions.

References

1. Siegel RL, Miller KD, Wagle NS, Jemal A (2023) Cancer statistics, 2023. CA Cancer J Clin 73:17–48 pmid:36633525
- View Article
- PubMed/NCBI
- Google Scholar
2. de Koning HJ, van der Aalst CM, de Jong PA, et al (2020) Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med 382:503–513 pmid:31995683
- View Article
- PubMed/NCBI
- Google Scholar
3. Mazzone PJ, Silvestri GA, Patel S, Kanne JP, Kinsinger LS, Wiener RS, et al (2018) Screening for Lung Cancer: CHEST Guideline and Expert Panel Report. Chest 153:954–985 pmid:29374513
- View Article
- PubMed/NCBI
- Google Scholar
4. Zgodic A, Zahnd WE, Advani S, Eberth JM (2022) Low-dose CT lung cancer screening uptake: A rural–urban comparison. J Rural Health 38:40–53 pmid:33734492
- View Article
- PubMed/NCBI
- Google Scholar
5. Seijo LM, Peled N, Ajona D, et al (2019) Biomarkers in Lung Cancer Screening: Achievements, Promises, and Challenges. Journal of Thoracic Oncology 14:343–357 pmid:30529598
- View Article
- PubMed/NCBI
- Google Scholar
6. Lin AA, Nimgaonkar V, Issadore D, Carpenter EL (2022) Extracellular Vesicle–Based Multianalyte Liquid Biopsy as a Diagnostic for Cancer. Annu Rev Biomed Data Sci 5:269–292 pmid:35562850
- View Article
- PubMed/NCBI
- Google Scholar
7. Boyle P, Chapman CJ, Holdenrieder S, et al (2011) Clinical validation of an autoantibody test for lung cancer. Ann Oncol 22:383–389 pmid:20675559
- View Article
- PubMed/NCBI
- Google Scholar
8. Montani F, Marzi MJ, Dezi F, et al (2015) miR-Test: A Blood Test for Lung Cancer Early Detection. J Natl Cancer I 107:63 pmid:25794889
- View Article
- PubMed/NCBI
- Google Scholar
9. Mathios D, Johansen JS, Cristiano S, et al (2021) Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat Commun 12:1–14
- View Article
- Google Scholar
10. Klein EA, Richards D, Cohn A, et al (2021) Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Annals of Oncology 32:1167–1177 pmid:34176681
- View Article
- PubMed/NCBI
- Google Scholar
11. Finn OJ (2012) Immuno-oncology: understanding the function and dysfunction of the immune system in cancer. Annals of Oncology 23:viii6–viii9 pmid:22918931
- View Article
- PubMed/NCBI
- Google Scholar
12. Allen BM, Hiam KJ, Burnett CE, Venida A, DeBarge R, Tenvooren I, et al (2020) Systemic dysfunction and plasticity of the immune macroenvironment in cancer models. Nat Med 26:1125–1134 pmid:32451499
- View Article
- PubMed/NCBI
- Google Scholar
13. Asada S, Kitamura T (2021) Clonal hematopoiesis and associated diseases: A review of recent findings. Cancer Sci 112:3962–3971 pmid:34328684
- View Article
- PubMed/NCBI
- Google Scholar
14. Nguyen YTM, Fujisawa M, Nguyen TB, et al (2021) Tet2 deficiency in immune cells exacerbates tumor progression by increasing angiogenesis in a lung cancer model. Cancer Sci 112:4931–4943 pmid:34657351
- View Article
- PubMed/NCBI
- Google Scholar
15. Liu X, Sato N, Shimosato Y, et al (2022) CHIP‐associated mutant ASXL1 in blood cells promotes solid tumor progression. Cancer Sci 113:1182 pmid:35133065
- View Article
- PubMed/NCBI
- Google Scholar
16. Lee M, Li J, Li J, et al (2021) Tet2 inactivation enhances the antitumor activity of tumor-infiltrating lymphocytes. Cancer Res 81:1965–1976 pmid:33589517
- View Article
- PubMed/NCBI
- Google Scholar
17. Kleppe M, Comen E, Wen HY, et al (2015) Somatic mutations in leukocytes infiltrating primary breast cancers. NPJ Breast Cancer 1:1–6 pmid:28721364
- View Article
- PubMed/NCBI
- Google Scholar
18. Jiang A, Qin Y, Springer TA (2022) Loss of LRRC33-dependent TGFβ1 activation enhances anti-tumor immunity and checkpoint blockade therapy. Cancer Immunol Res 10:453–467
- View Article
- Google Scholar
19. Han S, Liu ZQ, Chung DC, St Paul M, Garcia-Batres CR, Sayad A, et al (2022) Overproduction of IFNγ by Cbl-b-Deficient CD8+ T Cells Provides Resistance against Regulatory T Cells and Induces Potent Antitumor Immunity. Cancer Immunol Res 10:437–452
- View Article
- Google Scholar
20. Zink F, Stacey SN, Norddahl GL, et al (2017) Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130:742–752 pmid:28483762
- View Article
- PubMed/NCBI
- Google Scholar
21. Fidler TP, Xue C, Yalcinkaya M, et al (2021) The AIM2 inflammasome exacerbates atherosclerosis in clonal haematopoiesis. Nature 592:296–301 pmid:33731931
- View Article
- PubMed/NCBI
- Google Scholar
22. Bowman RL, Busque L, Levine RL (2018) Clonal Hematopoiesis and Evolution to Hematopoietic Malignancies. Cell Stem Cell 22:157–170 pmid:29395053
- View Article
- PubMed/NCBI
- Google Scholar
23. Lin AE, Rauch PJ, Jaiswal S, Ebert BL (2022) Clonal Hematopoiesis: Confluence of Malignant and Nonmalignant Diseases. Annual Reviews of Cancer Biology 6:187–200
- View Article
- Google Scholar
24. Jan M, Ebert BL, Jaiswal S (2017) Clonal hematopoiesis. Semin Hematol 54:43–50 pmid:28088988
- View Article
- PubMed/NCBI
- Google Scholar
25. Nam AS, Dusaj N, Izzo F, et al (2022) Single-cell multi-omics of human clonal hematopoiesis reveals that DNMT3A R882 mutations perturb early progenitor states through selective hypomethylation. Nat Genet 54:1514–1526 pmid:36138229
- View Article
- PubMed/NCBI
- Google Scholar
26. Challen GA, Goodell MA (2020) Clonal hematopoiesis: mechanisms driving dominance of stem cell clones. Blood 136:1590–1598 pmid:32746453
- View Article
- PubMed/NCBI
- Google Scholar
27. Sanmiguel JM, Eudy E, Loberg MA, Young KA, Mistry JJ, Mujica KD, et al (2022) Distinct Tumor Necrosis Factor Alpha Receptors Dictate Stem Cell Fitness versus Lineage Output in Dnmt3a-Mutant Clonal Hematopoiesis. Cancer Discov 12:2763–2773 pmid:36169447
- View Article
- PubMed/NCBI
- Google Scholar
28. Steensma DP, Bejar R, Jaiswal S, Lindsley RC, Sekeres MA, Hasserjian RP, et al (2015) Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126:9–16 pmid:25931582
- View Article
- PubMed/NCBI
- Google Scholar
29. Steensma DP (2018) Clinical Implications of Clonal Hematopoiesis. Mayo Clin Proc 93:1122–1130 pmid:30078412
- View Article
- PubMed/NCBI
- Google Scholar
30. Leinonen R, Sugawara H, Shumway M, Collaboration on behalf of the INSD (2011) The Sequence Read Archive. Nucleic Acids Res 39:D19–D21
- View Article
- Google Scholar
31. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760 pmid:19451168
- View Article
- PubMed/NCBI
- Google Scholar
32. Van der Auwera GA, Carneiro MO, Hartl C, et al (2013) From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Curr Protoc Bioinformatics 43:11.10.1–11.10.33 pmid:25431634
- View Article
- PubMed/NCBI
- Google Scholar
33. Franke KR, Crowgey EL (2020) Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms. Genomics Inform 18:e10 pmid:32224843
- View Article
- PubMed/NCBI
- Google Scholar
34. Carpi G, Gorenstein L, Harkins TT, Samadi M, Vats P (2022) A GPU-accelerated compute framework for pathogen genomic variant identification to aid genomic epidemiology of infectious disease: a malaria case study. Brief Bioinform 23:1–11 pmid:35945154
- View Article
- PubMed/NCBI
- Google Scholar
35. 10x Chromium (2019) Chromium Single Cell V(D)J Reagent Kits with Feature Barcoding technology for Cell Surface Protein, https://www.10xgenomics.com/support/single-cell-immune-profiling. https://www.10xgenomics.com/support/single-cell-immune-profiling. Accessed 17 Mar 2023
36. Cao Y, Wang X, Peng G (2020) SCSA: A cell type annotation tool for single-cell RNA-seq data. Front Genet 11:490 pmid:32477414
- View Article
- PubMed/NCBI
- Google Scholar
37. Pedregosa FABIANPEDREGOSA F, Michel V, Grisel OLIVIERGRISEL O, et al (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12:2825–2830
- View Article
- Google Scholar
38. Seabold S, Perktold J (2010) Statsmodels: Econometric and Statistical Modeling with Python. In: PROC. OF THE 9th PYTHON IN SCIENCE CONF. pp 92–96
39. One-Off Coder (2023) Data Science Topics. In: https://datascience.oneoffcoder.com. https://datascience.oneoffcoder.com. Accessed 17 Mar 2023
40. Anandakrishnan R, Zyvoloski IJ, Zyvoloski LR, Opoku NK, Dai A, Antony V (2023) Potential immunosuppressive clonal hematopoietic mutations in tumor infiltrating immune cells in breast invasive carcinoma. Sci Rep 13:13131 pmid:37573441
- View Article
- PubMed/NCBI
- Google Scholar
41. Weinstein JN, Collisson EA, Mills GB, et al (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45:1113–1120 pmid:24071849
- View Article
- PubMed/NCBI
- Google Scholar
42. Elazezy M, Joosse SA (2018) Techniques of using circulating tumor DNA as a liquid biopsy component in cancer management. Comput Struct Biotechnol J 16:370–378 pmid:30364656
- View Article
- PubMed/NCBI
- Google Scholar
43. Petrackova A, Vasinek M, Sedlarikova L, Dyskova T, Schneiderova P, Novosad T, et al (2019) Standardization of Sequencing Coverage Depth in NGS: Recommendation for Detection of Clonal and Subclonal Mutations in Cancer Diagnostics. Front Oncol 9:851 pmid:31552176
- View Article
- PubMed/NCBI
- Google Scholar
44. Lambrechts D, Wauters E, Boeckx B, et al (2018) Phenotype molding of stromal cells in the lung tumor microenvironment. Nat Med 24:1277–1289 pmid:29988129
- View Article
- PubMed/NCBI
- Google Scholar
45. Laughney AM, Hu J, Campbell NR, et al (2020) Regenerative lineages and immune-mediated pruning in lung cancer metastasis. Nat Med 26:259–269 pmid:32042191
- View Article
- PubMed/NCBI
- Google Scholar
46. Sinjab A, Han G, Treekitkarnmongkol W, et al (2021) Resolving the spatial and cellular architecture of lung adenocarcinoma by multiregion single-cell sequencing. Cancer Discov 11:2506–2523 pmid:33972311
- View Article
- PubMed/NCBI
- Google Scholar
47. Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452–W457 pmid:22689647
- View Article
- PubMed/NCBI
- Google Scholar
48. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249 pmid:20354512
- View Article
- PubMed/NCBI
- Google Scholar
49. Altshuler DM, Durbin RM, Abecasis GR, et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65 pmid:23128226
- View Article
- PubMed/NCBI
- Google Scholar
50. Whalen S, Schreiber J, Noble WS, Pollard KS (2021) Navigating the pitfalls of applying machine learning in genomics. Nat Rev Genet 23:169–181 pmid:34837041
- View Article
- PubMed/NCBI
- Google Scholar
51. Joshi A, Butle A, Hait S, Mishra R, Trivedi V, Thorat R, et al (2022) Osimertinib for lung cancer cells harboring low-frequency EGFR T790M mutation. Transl Oncol 22:101461 pmid:35653897
- View Article
- PubMed/NCBI
- Google Scholar
52. Formenti SC, Rudqvist NP, Golden E, et al (2018) Radiotherapy induces responses of lung cancer to CTLA-4 blockade. Nature Medicine 2018 24:12 24:1845–1851 pmid:30397353
- View Article
- PubMed/NCBI
- Google Scholar
53. Jia Q, Chu Q, Zhang A, et al (2021) Mutational burden and chromosomal aneuploidy synergistically predict survival from radiotherapy in non-small cell lung cancer. Commun Biol 4:1–7
- View Article
- Google Scholar
54. Mao W, Chen R, Lu R, Wang S, Song H, You D, et al (2021) Germline mutation analyses of malignant ground glass opacity nodules in non- smoking lung adenocarcinoma patients. PeerJ 9:e12048 pmid:34540367
- View Article
- PubMed/NCBI
- Google Scholar
55. Collisson EA, Campbell JD, Brooks AN, et al (2014) Comprehensive molecular profiling of lung adenocarcinoma. Nature 511:543–550 pmid:25079552
- View Article
- PubMed/NCBI
- Google Scholar
56. Gillette MA, Satpathy S, Cao S, et al (2020) Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma. Cell 182:200–225.e35 pmid:32649874
- View Article
- PubMed/NCBI
- Google Scholar
57. Satpathy S, Krug K, Jean Beltran PM, et al (2021) A proteogenomic portrait of lung squamous cell carcinoma. Cell 184:4348–4371.e40 pmid:34358469
- View Article
- PubMed/NCBI
- Google Scholar
58. Schenk MW, Humphrey S, Hossain ASMM, et al (2021) Soluble guanylate cyclase signalling mediates etoposide resistance in progressing small cell lung cancer. Nat Commun 12:1–15
- View Article
- Google Scholar
59. Hammerman PS, Voet D, Lawrence MS, et al (2012) Comprehensive genomic characterization of squamous cell lung cancers. Nature 489:519–525 pmid:22960745
- View Article
- PubMed/NCBI
- Google Scholar
60. Ribeiro-dos-Santos AM, Vidal AF, Vinasco-Sandoval T, Guerreiro J, Santos S, Ribeiro-dos-Santos Â, et al (2020) Exome Sequencing of Native Populations From the Amazon Reveals Patterns on the Peopling of South America. Front Genet 11:1359 pmid:33193622
- View Article
- PubMed/NCBI
- Google Scholar
61. Fischbach GD, Lord C (2010) The simons simplex collection: A resource for identification of autism genetic risk factors. Neuron 68:192–195 pmid:20955926
- View Article
- PubMed/NCBI
- Google Scholar
62. Auton A, Abecasis GR, Altshuler DM, et al (2015) A global reference for human genetic variation. Nature 526:68–74 pmid:26432245
- View Article
- PubMed/NCBI
- Google Scholar
63. Lyons JJ, Yu X, Hughes JD, et al (2016) Elevated basal serum tryptase identifies a multisystem disorder associated with increased TPSAB1 copy number. Nat Genet 48:1564–1569 pmid:27749843
- View Article
- PubMed/NCBI
- Google Scholar
64. Xu Q, Wu C, Zhu Q, et al (2022) Metagenomic and metabolomic remodeling in nonagenarians and centenarians and its association with genetic and socioeconomic factors. Nat Aging 2:438–452 pmid:37118062
- View Article
- PubMed/NCBI
- Google Scholar
65. Chambers JC, Abbott J, Zhang W, et al (2014) The South Asian Genome. PLoS One 9:e102645 pmid:25115870
- View Article
- PubMed/NCBI
- Google Scholar
66. Rodriguez-Flores J, O’Beirne S, Salit J, Kaner R, Downey R, Mezey J, et al (2018) Identification of Large Clones of Potentially Deleterious Somatic Mutations in the Small Airway Epithelium of Smokers Without Cancer. Am J Respir Crit Care Med 197:A1952
- View Article
- Google Scholar
67. Park JS, Lee J, Jung ES, et al (2019) Brain somatic mutations observed in Alzheimer’s disease associated with aging and dysregulation of tau phosphorylation. Nat Commun 10:1–12
- View Article
- Google Scholar
68. Kim SY (2009) Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinformatics 10:1–10
- View Article
- Google Scholar
69. Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DSA, Nobel AB, et al (2006) Concordance among Gene-Expression–Based Predictors for Breast Cancer. New England Journal of Medicine 355:560–569 pmid:16899776
- View Article
- PubMed/NCBI
- Google Scholar
70. Dobbin KK, Zhao Y, Simon RM (2008) How Large a Training Set is Needed to Develop a Classifier for Microarray Data? Clinical Cancer Research 14:108–114 pmid:18172259
- View Article
- PubMed/NCBI
- Google Scholar
71. Kim J, Kim D, Lim JS, Maeng JH, Son H, Kang HC, et al (2019) The use of technical replication for detection of low-level somatic mutations in next-generation sequencing. Nat Commun 10:1–11
- View Article
- Google Scholar
72. Takahashi H, Ogata H, Nishigaki R, Broide DH, Karin M (2010) Tobacco Smoke Promotes Lung Tumorigenesis by Triggering IKKβ- and JNK1-Dependent Inflammation. Cancer Cell 17:89–97
- View Article
- Google Scholar
73. Swann JB, Vesely MD, Silva A, Sharkey J, Akira S, Schreiber RD, et al (2008) Demonstration of inflammation-induced cancer and cancer immunoediting during primary tumorigenesis. Proc Natl Acad Sci U S A 105:652–656 pmid:18178624
- View Article
- PubMed/NCBI
- Google Scholar
74. Wu Y, Ye S, Goswami S, Pei X, Xiang L, Zhang X, et al (2020) Clinical significance of peripheral blood and tumor tissue lymphocyte subsets in cervical cancer patients. BMC Cancer 20:1–12 pmid:32131750
- View Article
- PubMed/NCBI
- Google Scholar
75. Butler M, Morel AS, Jordan WJ, Eren E, Hue S, Shrimpton RE, et al (2007) Altered expression and endocytic function of CD205 in human dendritic cells, and detection of a CD205–DCL-1 fusion protein upon dendritic cell maturation. Immunology 120:362–371 pmid:17163964
- View Article
- PubMed/NCBI
- Google Scholar
76. Stengel S, Quickert S, Lutz P, et al (2020) Peritoneal Level of CD206 Associates With Mortality and an Inflammatory Macrophage Phenotype in Patients With Decompensated Cirrhosis and Spontaneous Bacterial Peritonitis. Gastroenterology 158:1745–1761 pmid:31982413
- View Article
- PubMed/NCBI
- Google Scholar
77. Brabender J, Danenberg K, Metzger R, et al (2002) The role of retinoid X receptor messenger RNA expression in curatively resected non-small cell lung cancer. Clin Cancer Res 8:438–443 pmid:11839661
- View Article
- PubMed/NCBI
- Google Scholar
78. Rigas JR, Dragnev KH (2005) Emerging Role of Rexinoids in Non-Small Cell Lung Cancer: Focus on Bexarotene. Oncologist 10:22–33 pmid:15632250
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Siegel RL, Miller KD, Wagle NS, Jemal A (2023) Cancer statistics, 2023. CA Cancer J Clin 73:17–48 pmid:36633525
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. de Koning HJ, van der Aalst CM, de Jong PA, et al (2020) Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med 382:503–513 pmid:31995683
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Mazzone PJ, Silvestri GA, Patel S, Kanne JP, Kinsinger LS, Wiener RS, et al (2018) Screening for Lung Cancer: CHEST Guideline and Expert Panel Report. Chest 153:954–985 pmid:29374513
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Zgodic A, Zahnd WE, Advani S, Eberth JM (2022) Low-dose CT lung cancer screening uptake: A rural–urban comparison. J Rural Health 38:40–53 pmid:33734492
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Seijo LM, Peled N, Ajona D, et al (2019) Biomarkers in Lung Cancer Screening: Achievements, Promises, and Challenges. Journal of Thoracic Oncology 14:343–357 pmid:30529598
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Lin AA, Nimgaonkar V, Issadore D, Carpenter EL (2022) Extracellular Vesicle–Based Multianalyte Liquid Biopsy as a Diagnostic for Cancer. Annu Rev Biomed Data Sci 5:269–292 pmid:35562850
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Boyle P, Chapman CJ, Holdenrieder S, et al (2011) Clinical validation of an autoantibody test for lung cancer. Ann Oncol 22:383–389 pmid:20675559
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Montani F, Marzi MJ, Dezi F, et al (2015) miR-Test: A Blood Test for Lung Cancer Early Detection. J Natl Cancer I 107:63 pmid:25794889
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Mathios D, Johansen JS, Cristiano S, et al (2021) Detection and characterization of lung cancer using cell-free DNA fragmentomes. Nat Commun 12:1–14
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref10] 10. Klein EA, Richards D, Cohn A, et al (2021) Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Annals of Oncology 32:1167–1177 pmid:34176681
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref11] 11. Finn OJ (2012) Immuno-oncology: understanding the function and dysfunction of the immune system in cancer. Annals of Oncology 23:viii6–viii9 pmid:22918931
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref12] 12. Allen BM, Hiam KJ, Burnett CE, Venida A, DeBarge R, Tenvooren I, et al (2020) Systemic dysfunction and plasticity of the immune macroenvironment in cancer models. Nat Med 26:1125–1134 pmid:32451499
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref13] 13. Asada S, Kitamura T (2021) Clonal hematopoiesis and associated diseases: A review of recent findings. Cancer Sci 112:3962–3971 pmid:34328684
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Nguyen YTM, Fujisawa M, Nguyen TB, et al (2021) Tet2 deficiency in immune cells exacerbates tumor progression by increasing angiogenesis in a lung cancer model. Cancer Sci 112:4931–4943 pmid:34657351
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Liu X, Sato N, Shimosato Y, et al (2022) CHIP‐associated mutant ASXL1 in blood cells promotes solid tumor progression. Cancer Sci 113:1182 pmid:35133065
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Lee M, Li J, Li J, et al (2021) Tet2 inactivation enhances the antitumor activity of tumor-infiltrating lymphocytes. Cancer Res 81:1965–1976 pmid:33589517
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Kleppe M, Comen E, Wen HY, et al (2015) Somatic mutations in leukocytes infiltrating primary breast cancers. NPJ Breast Cancer 1:1–6 pmid:28721364
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref18] 18. Jiang A, Qin Y, Springer TA (2022) Loss of LRRC33-dependent TGFβ1 activation enhances anti-tumor immunity and checkpoint blockade therapy. Cancer Immunol Res 10:453–467
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref19] 19. Han S, Liu ZQ, Chung DC, St Paul M, Garcia-Batres CR, Sayad A, et al (2022) Overproduction of IFNγ by Cbl-b-Deficient CD8+ T Cells Provides Resistance against Regulatory T Cells and Induces Potent Antitumor Immunity. Cancer Immunol Res 10:437–452
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref20] 20. Zink F, Stacey SN, Norddahl GL, et al (2017) Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130:742–752 pmid:28483762
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref21] 21. Fidler TP, Xue C, Yalcinkaya M, et al (2021) The AIM2 inflammasome exacerbates atherosclerosis in clonal haematopoiesis. Nature 592:296–301 pmid:33731931
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref22] 22. Bowman RL, Busque L, Levine RL (2018) Clonal Hematopoiesis and Evolution to Hematopoietic Malignancies. Cell Stem Cell 22:157–170 pmid:29395053
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref23] 23. Lin AE, Rauch PJ, Jaiswal S, Ebert BL (2022) Clonal Hematopoiesis: Confluence of Malignant and Nonmalignant Diseases. Annual Reviews of Cancer Biology 6:187–200
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref24] 24. Jan M, Ebert BL, Jaiswal S (2017) Clonal hematopoiesis. Semin Hematol 54:43–50 pmid:28088988
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref25] 25. Nam AS, Dusaj N, Izzo F, et al (2022) Single-cell multi-omics of human clonal hematopoiesis reveals that DNMT3A R882 mutations perturb early progenitor states through selective hypomethylation. Nat Genet 54:1514–1526 pmid:36138229
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref26] 26. Challen GA, Goodell MA (2020) Clonal hematopoiesis: mechanisms driving dominance of stem cell clones. Blood 136:1590–1598 pmid:32746453
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref27] 27. Sanmiguel JM, Eudy E, Loberg MA, Young KA, Mistry JJ, Mujica KD, et al (2022) Distinct Tumor Necrosis Factor Alpha Receptors Dictate Stem Cell Fitness versus Lineage Output in Dnmt3a-Mutant Clonal Hematopoiesis. Cancer Discov 12:2763–2773 pmid:36169447
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref28] 28. Steensma DP, Bejar R, Jaiswal S, Lindsley RC, Sekeres MA, Hasserjian RP, et al (2015) Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126:9–16 pmid:25931582
View Article
PubMed/NCBI
Google Scholar

[106] View Article

[107] PubMed/NCBI

[108] Google Scholar

[ref29] 29. Steensma DP (2018) Clinical Implications of Clonal Hematopoiesis. Mayo Clin Proc 93:1122–1130 pmid:30078412
View Article
PubMed/NCBI
Google Scholar

[110] View Article

[111] PubMed/NCBI

[112] Google Scholar

[ref30] 30. Leinonen R, Sugawara H, Shumway M, Collaboration on behalf of the INSD (2011) The Sequence Read Archive. Nucleic Acids Res 39:D19–D21
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref31] 31. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760 pmid:19451168
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref32] 32. Van der Auwera GA, Carneiro MO, Hartl C, et al (2013) From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Curr Protoc Bioinformatics 43:11.10.1–11.10.33 pmid:25431634
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref33] 33. Franke KR, Crowgey EL (2020) Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms. Genomics Inform 18:e10 pmid:32224843
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref34] 34. Carpi G, Gorenstein L, Harkins TT, Samadi M, Vats P (2022) A GPU-accelerated compute framework for pathogen genomic variant identification to aid genomic epidemiology of infectious disease: a malaria case study. Brief Bioinform 23:1–11 pmid:35945154
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref35] 35. 10x Chromium (2019) Chromium Single Cell V(D)J Reagent Kits with Feature Barcoding technology for Cell Surface Protein, https://www.10xgenomics.com/support/single-cell-immune-profiling. https://www.10xgenomics.com/support/single-cell-immune-profiling. Accessed 17 Mar 2023

[ref36] 36. Cao Y, Wang X, Peng G (2020) SCSA: A cell type annotation tool for single-cell RNA-seq data. Front Genet 11:490 pmid:32477414
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

[ref37] 37. Pedregosa FABIANPEDREGOSA F, Michel V, Grisel OLIVIERGRISEL O, et al (2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12:2825–2830
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref38] 38. Seabold S, Perktold J (2010) Statsmodels: Econometric and Statistical Modeling with Python. In: PROC. OF THE 9th PYTHON IN SCIENCE CONF. pp 92–96

[ref39] 39. One-Off Coder (2023) Data Science Topics. In: https://datascience.oneoffcoder.com. https://datascience.oneoffcoder.com. Accessed 17 Mar 2023

[ref40] 40. Anandakrishnan R, Zyvoloski IJ, Zyvoloski LR, Opoku NK, Dai A, Antony V (2023) Potential immunosuppressive clonal hematopoietic mutations in tumor infiltrating immune cells in breast invasive carcinoma. Sci Rep 13:13131 pmid:37573441
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref41] 41. Weinstein JN, Collisson EA, Mills GB, et al (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45:1113–1120 pmid:24071849
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref42] 42. Elazezy M, Joosse SA (2018) Techniques of using circulating tumor DNA as a liquid biopsy component in cancer management. Comput Struct Biotechnol J 16:370–378 pmid:30364656
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

[ref43] 43. Petrackova A, Vasinek M, Sedlarikova L, Dyskova T, Schneiderova P, Novosad T, et al (2019) Standardization of Sequencing Coverage Depth in NGS: Recommendation for Detection of Clonal and Subclonal Mutations in Cancer Diagnostics. Front Oncol 9:851 pmid:31552176
View Article
PubMed/NCBI
Google Scholar

[155] View Article

[156] PubMed/NCBI

[157] Google Scholar

[ref44] 44. Lambrechts D, Wauters E, Boeckx B, et al (2018) Phenotype molding of stromal cells in the lung tumor microenvironment. Nat Med 24:1277–1289 pmid:29988129
View Article
PubMed/NCBI
Google Scholar

[159] View Article

[160] PubMed/NCBI

[161] Google Scholar

[ref45] 45. Laughney AM, Hu J, Campbell NR, et al (2020) Regenerative lineages and immune-mediated pruning in lung cancer metastasis. Nat Med 26:259–269 pmid:32042191
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

[ref46] 46. Sinjab A, Han G, Treekitkarnmongkol W, et al (2021) Resolving the spatial and cellular architecture of lung adenocarcinoma by multiregion single-cell sequencing. Cancer Discov 11:2506–2523 pmid:33972311
View Article
PubMed/NCBI
Google Scholar

[167] View Article

[168] PubMed/NCBI

[169] Google Scholar

[ref47] 47. Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452–W457 pmid:22689647
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref48] 48. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249 pmid:20354512
View Article
PubMed/NCBI
Google Scholar

[175] View Article

[176] PubMed/NCBI

[177] Google Scholar

[ref49] 49. Altshuler DM, Durbin RM, Abecasis GR, et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65 pmid:23128226
View Article
PubMed/NCBI
Google Scholar

[179] View Article

[180] PubMed/NCBI

[181] Google Scholar

[ref50] 50. Whalen S, Schreiber J, Noble WS, Pollard KS (2021) Navigating the pitfalls of applying machine learning in genomics. Nat Rev Genet 23:169–181 pmid:34837041
View Article
PubMed/NCBI
Google Scholar

[183] View Article

[184] PubMed/NCBI

[185] Google Scholar

[ref51] 51. Joshi A, Butle A, Hait S, Mishra R, Trivedi V, Thorat R, et al (2022) Osimertinib for lung cancer cells harboring low-frequency EGFR T790M mutation. Transl Oncol 22:101461 pmid:35653897
View Article
PubMed/NCBI
Google Scholar

[187] View Article

[188] PubMed/NCBI

[189] Google Scholar

[ref52] 52. Formenti SC, Rudqvist NP, Golden E, et al (2018) Radiotherapy induces responses of lung cancer to CTLA-4 blockade. Nature Medicine 2018 24:12 24:1845–1851 pmid:30397353
View Article
PubMed/NCBI
Google Scholar

[191] View Article

[192] PubMed/NCBI

[193] Google Scholar

[ref53] 53. Jia Q, Chu Q, Zhang A, et al (2021) Mutational burden and chromosomal aneuploidy synergistically predict survival from radiotherapy in non-small cell lung cancer. Commun Biol 4:1–7
View Article
Google Scholar

[195] View Article

[196] Google Scholar

[ref54] 54. Mao W, Chen R, Lu R, Wang S, Song H, You D, et al (2021) Germline mutation analyses of malignant ground glass opacity nodules in non- smoking lung adenocarcinoma patients. PeerJ 9:e12048 pmid:34540367
View Article
PubMed/NCBI
Google Scholar

[198] View Article

[199] PubMed/NCBI

[200] Google Scholar

[ref55] 55. Collisson EA, Campbell JD, Brooks AN, et al (2014) Comprehensive molecular profiling of lung adenocarcinoma. Nature 511:543–550 pmid:25079552
View Article
PubMed/NCBI
Google Scholar

[202] View Article

[203] PubMed/NCBI

[204] Google Scholar

[ref56] 56. Gillette MA, Satpathy S, Cao S, et al (2020) Proteogenomic Characterization Reveals Therapeutic Vulnerabilities in Lung Adenocarcinoma. Cell 182:200–225.e35 pmid:32649874
View Article
PubMed/NCBI
Google Scholar

[206] View Article

[207] PubMed/NCBI

[208] Google Scholar

[ref57] 57. Satpathy S, Krug K, Jean Beltran PM, et al (2021) A proteogenomic portrait of lung squamous cell carcinoma. Cell 184:4348–4371.e40 pmid:34358469
View Article
PubMed/NCBI
Google Scholar

[210] View Article

[211] PubMed/NCBI

[212] Google Scholar

[ref58] 58. Schenk MW, Humphrey S, Hossain ASMM, et al (2021) Soluble guanylate cyclase signalling mediates etoposide resistance in progressing small cell lung cancer. Nat Commun 12:1–15
View Article
Google Scholar

[214] View Article

[215] Google Scholar

[ref59] 59. Hammerman PS, Voet D, Lawrence MS, et al (2012) Comprehensive genomic characterization of squamous cell lung cancers. Nature 489:519–525 pmid:22960745
View Article
PubMed/NCBI
Google Scholar

[217] View Article

[218] PubMed/NCBI

[219] Google Scholar

[ref60] 60. Ribeiro-dos-Santos AM, Vidal AF, Vinasco-Sandoval T, Guerreiro J, Santos S, Ribeiro-dos-Santos Â, et al (2020) Exome Sequencing of Native Populations From the Amazon Reveals Patterns on the Peopling of South America. Front Genet 11:1359 pmid:33193622
View Article
PubMed/NCBI
Google Scholar

[221] View Article

[222] PubMed/NCBI

[223] Google Scholar

[ref61] 61. Fischbach GD, Lord C (2010) The simons simplex collection: A resource for identification of autism genetic risk factors. Neuron 68:192–195 pmid:20955926
View Article
PubMed/NCBI
Google Scholar

[225] View Article

[226] PubMed/NCBI

[227] Google Scholar

[ref62] 62. Auton A, Abecasis GR, Altshuler DM, et al (2015) A global reference for human genetic variation. Nature 526:68–74 pmid:26432245
View Article
PubMed/NCBI
Google Scholar

[229] View Article

[230] PubMed/NCBI

[231] Google Scholar

[ref63] 63. Lyons JJ, Yu X, Hughes JD, et al (2016) Elevated basal serum tryptase identifies a multisystem disorder associated with increased TPSAB1 copy number. Nat Genet 48:1564–1569 pmid:27749843
View Article
PubMed/NCBI
Google Scholar

[233] View Article

[234] PubMed/NCBI

[235] Google Scholar

[ref64] 64. Xu Q, Wu C, Zhu Q, et al (2022) Metagenomic and metabolomic remodeling in nonagenarians and centenarians and its association with genetic and socioeconomic factors. Nat Aging 2:438–452 pmid:37118062
View Article
PubMed/NCBI
Google Scholar

[237] View Article

[238] PubMed/NCBI

[239] Google Scholar

[ref65] 65. Chambers JC, Abbott J, Zhang W, et al (2014) The South Asian Genome. PLoS One 9:e102645 pmid:25115870
View Article
PubMed/NCBI
Google Scholar

[241] View Article

[242] PubMed/NCBI

[243] Google Scholar

[ref66] 66. Rodriguez-Flores J, O’Beirne S, Salit J, Kaner R, Downey R, Mezey J, et al (2018) Identification of Large Clones of Potentially Deleterious Somatic Mutations in the Small Airway Epithelium of Smokers Without Cancer. Am J Respir Crit Care Med 197:A1952
View Article
Google Scholar

[245] View Article

[246] Google Scholar

[ref67] 67. Park JS, Lee J, Jung ES, et al (2019) Brain somatic mutations observed in Alzheimer’s disease associated with aging and dysregulation of tau phosphorylation. Nat Commun 10:1–12
View Article
Google Scholar

[248] View Article

[249] Google Scholar

[ref68] 68. Kim SY (2009) Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinformatics 10:1–10
View Article
Google Scholar

[251] View Article

[252] Google Scholar

[ref69] 69. Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DSA, Nobel AB, et al (2006) Concordance among Gene-Expression–Based Predictors for Breast Cancer. New England Journal of Medicine 355:560–569 pmid:16899776
View Article
PubMed/NCBI
Google Scholar

[254] View Article

[255] PubMed/NCBI

[256] Google Scholar

[ref70] 70. Dobbin KK, Zhao Y, Simon RM (2008) How Large a Training Set is Needed to Develop a Classifier for Microarray Data? Clinical Cancer Research 14:108–114 pmid:18172259
View Article
PubMed/NCBI
Google Scholar

[258] View Article

[259] PubMed/NCBI

[260] Google Scholar

[ref71] 71. Kim J, Kim D, Lim JS, Maeng JH, Son H, Kang HC, et al (2019) The use of technical replication for detection of low-level somatic mutations in next-generation sequencing. Nat Commun 10:1–11
View Article
Google Scholar

[262] View Article

[263] Google Scholar

[ref72] 72. Takahashi H, Ogata H, Nishigaki R, Broide DH, Karin M (2010) Tobacco Smoke Promotes Lung Tumorigenesis by Triggering IKKβ- and JNK1-Dependent Inflammation. Cancer Cell 17:89–97
View Article
Google Scholar

[265] View Article

[266] Google Scholar

[ref73] 73. Swann JB, Vesely MD, Silva A, Sharkey J, Akira S, Schreiber RD, et al (2008) Demonstration of inflammation-induced cancer and cancer immunoediting during primary tumorigenesis. Proc Natl Acad Sci U S A 105:652–656 pmid:18178624
View Article
PubMed/NCBI
Google Scholar

[268] View Article

[269] PubMed/NCBI

[270] Google Scholar

[ref74] 74. Wu Y, Ye S, Goswami S, Pei X, Xiang L, Zhang X, et al (2020) Clinical significance of peripheral blood and tumor tissue lymphocyte subsets in cervical cancer patients. BMC Cancer 20:1–12 pmid:32131750
View Article
PubMed/NCBI
Google Scholar

[272] View Article

[273] PubMed/NCBI

[274] Google Scholar

[ref75] 75. Butler M, Morel AS, Jordan WJ, Eren E, Hue S, Shrimpton RE, et al (2007) Altered expression and endocytic function of CD205 in human dendritic cells, and detection of a CD205–DCL-1 fusion protein upon dendritic cell maturation. Immunology 120:362–371 pmid:17163964
View Article
PubMed/NCBI
Google Scholar

[276] View Article

[277] PubMed/NCBI

[278] Google Scholar

[ref76] 76. Stengel S, Quickert S, Lutz P, et al (2020) Peritoneal Level of CD206 Associates With Mortality and an Inflammatory Macrophage Phenotype in Patients With Decompensated Cirrhosis and Spontaneous Bacterial Peritonitis. Gastroenterology 158:1745–1761 pmid:31982413
View Article
PubMed/NCBI
Google Scholar

[280] View Article

[281] PubMed/NCBI

[282] Google Scholar

[ref77] 77. Brabender J, Danenberg K, Metzger R, et al (2002) The role of retinoid X receptor messenger RNA expression in curatively resected non-small cell lung cancer. Clin Cancer Res 8:438–443 pmid:11839661
View Article
PubMed/NCBI
Google Scholar

[284] View Article

[285] PubMed/NCBI

[286] Google Scholar

[ref78] 78. Rigas JR, Dragnev KH (2005) Emerging Role of Rexinoids in Non-Small Cell Lung Cancer: Focus on Bexarotene. Oncologist 10:22–33 pmid:15632250
View Article
PubMed/NCBI
Google Scholar

[288] View Article

[289] PubMed/NCBI

[290] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Exome sequencing data and variant calling

Single-cell RNA sequencing (scRNA-seq) data and cell type analysis

Training, validation, and test sample sets

Machine learning (ML) algorithms

Statistical analysis

Results

Potentially pathogenic CH mutations in TII cells

Machine learning (ML) study design

Logistic regression (LR) model

Test results

Discussion

Conclusions

Supporting information

S1 File. The following tables and figures are included in the supporting information file.

Acknowledgments

References