Figures
Abstract
Blood-based testing represents a valuable tool for the detection and monitoring of patient conditions in both human and veterinary medicine. When conventional tissue-based diagnosis is challenging, blood-derived measurements allow for minimally invasive testing. Recent studies across mammalian species, particularly in humans, have explored the use of DNA methylation from whole blood, revealing its potential to predict individual mortality and responses to environmental stresses. While it is well recognized that tumor lesions display altered epigenetic modifications across some mammalian species, little is known about how DNA methylation in blood, as an indirect tissue sample, reflects the status of individuals in dogs. In this study, we conducted whole genome bisulfite sequencing using whole blood samples from twenty dogs diagnosed with canine gastrointestinal lymphoma, which is a prevalent disease in dogs. Comparative analysis with non-lymphoma controls identified over one thousand differentially methylated regions (DMRs). To develop practical predictive models, we narrowed down the number of DMRs from the total identified to a feasible set of probes using machine learning, achieving high accuracy (0.8–0.9) in predicting lymphoma cases. Our research underscores the potential of utilizing DNA methylation from whole blood as predictors and establishes a foundational data infrastructure for genome-wide DNA methylation for canine health monitoring for future studies.
Citation: Nakamura M, Matsumoto Y, Nagata M, Yasuda K, Yonekawa K, Muramatsu S, et al. (2025) Exploring DNA methylation profiles in blood samples of canine gastrointestinal lymphoma. PLoS One 20(12): e0339388. https://doi.org/10.1371/journal.pone.0339388
Editor: Amjad Ali, Hazara University, PAKISTAN
Received: June 18, 2025; Accepted: December 6, 2025; Published: December 30, 2025
Copyright: © 2025 Nakamura et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The whole genome bisulfite sequencing (WGBS) data from this study are available at the Gene Expression Omnibus (GEO) NCBI (accession number GSE289850: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE289850). The previously published WGBS control data are available at under accession number GSE252908 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE252908). Both datasets have been integrated into a GEO SuperSeries under accession number GSE309781(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE309781). The computational pipeline used in this study is available at https://github.com/MMnakam/dog_lymphoma_DNAme.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
DNA methylation is a dynamic epigenetic mark that can reflect various biological processes through extrinsic and intrinsic factors. It is frequently observed on the cytosine of CpG di-nucleotides. This modification is variable based on tissues/cell-types, aging, environmental exposures, and other factors [1]. Recently, there are several attempts to predict individual health status from DNA methylation pattern to support disease diagnosis or monitoring prognosis as a biomarker in human [2–4]. DNA methylation is a covalent bond and is relatively stable and resistant toward degeneration. Moreover, DNA methylation pattern is detectable from small volume of blood [5]. Therefore, DNA methylation biomarkers derived from blood cells are expected to be non-invasive or minimally invasive marker. For example, in human studies, it has been reported that DNA methylation patterns of blood are associated with cancer incidence and mortality [6,7], and post-COVID-19 conditions [8]. In other cases, DNA methylation is also used as a more direct approach to distinguish cancer from normal tissues or to identify the localization of the lesion using cell-free DNA (cfDNA), which includes circulating tumor DNA that is released from affected tissue into blood plasma in those patients [9–13].
In this study, we attempted to predict canine gastrointestinal lymphoma incidence using DNA methylation patterns of whole blood cells rather than addressing cfDNA or tumor lesions. It is sometimes difficult to detect lesions in the cases of lymphoma, although the subject has a suspicion of lymphoma. This complicates the biopsy procedures, underscoring the fundamental requirement for developing of blood markers for diagnostic support. To discover genomic regions that show differences in DNA methylation between lymphoma cases and controls, we performed whole genome bisulfite sequencing (WGBS) and identified nearly 1,500 differentially methylated regions (DMRs) between blood samples from lymphoma and non-lymphoma subjects. To design a cost-effective molecular assay, we defined these DMRs as candidate predictors and narrowed them down to practical set size using machine learning approaches. Our study revealed that the models using a combination multiple DMRs were as effective as those using a single DMR. This study not only highlights the potential of DNA methylation signatures for canine gastrointestinal lymphoma diagnosis but also emphasizes the potential of machine learning in refining predictive models.
Materials and methods
Animal and sample preparation
A total of 20 dogs (Canis lupus familiaris) with a confirmed diagnosis of gastrointestinal lymphoma were included in this study. Materials were obtained from two research institutes, Hokkaido University Veterinary Teaching Hospital and Anicom Specialty Medical Institute, Inc., with written consent from the owners for their animals’ participation in this study. This study has been approved by the Laboratory Animal Experimentation Committee of the Graduate School of Veterinary Medicine, Hokkaido University (Approval number: 2022−012) and by the Ethics Screening Committees of Anicom Specialty Medical Institute, Inc. (Approval number: 2022−02), by providing written consent that follows the guidelines. Amount of 0.1–0.4 ml whole blood from each individual was used for DNA isolation. Genomic DNA was isolated from whole blood using the DNeasy Blood and Tissue Kit (QIAGEN, Germantown, MD, USA) following the manufacturer’s protocol. All efforts were made to minimize animal suffering during sample collection.
Study design and characteristics of subject population
To explore the differentially methylated region, we performed WGBS on blood samples from 20 individual dogs that were diagnosed with canine gastrointestinal lymphoma. As control samples, we retrieved WGBS data from 19 individuals that were nominally healthy group except for injuries in some cases from the previous study (GSE252908) [14]. All samples (both cases and controls) were collected prospectively as part of a single case-control study. Although all measurements were conducted simultaneously, samples originated from multiple facilities, herein referred to as collection setting. A potential limitation of our study is the institutional heterogeneity in sample collection. None the less, the majority of samples (>65%) from both case and control groups were collected at a single facility, which reduces the likelihood of substantial inter-site technical variation. The demographics of subjects in this study was summarized in Table 1.
Sequencing and mapping
Genomic DNA extracted from whole blood was sent to Novogene for bisulfite conversion, library preparation, and sequencing. Bisulfite conversion was conducted using EZ DNA Methylation Gold Kit (ZYMO Research, USA). For the library preparation, the Scale Methyl-DNA Lib Prep Kit (ABclonal, China) was used. Prepared libraries for WGBS were sequenced with 150 cycle paired end mode in the NovaSeq. The amounts of reads were targeted to provide 70 × coverage of depth for each subject. Obtained read information was summarized in Tables 2 and 3.
Basic data analysis
For the sequence quality control and mapping, the procedures were described in the previous study [14]. In brief, quality filtered reads were mapped to the dog reference genome Canfam3.1 and methylation rate were called using methylpy (v1.2.9) [15]. CpGs with low coverage of depth (less than 10 reads per CpG across all samples) or with extremely high coverage (more than 500 read per CpG in at least one samples) were removed. Approximately 14.2 million CpGs (14,242,200), equal to more than half of all CpGs in the dog reference genome, were analyzed in subsequent analyses. Principal component analysis PCA was conducted using prcomp function in R with scaling option.
Estimating the variance of the covariates
To estimate the amount of potential factors as covariates in this experiments, Principal variance component analysis (PVCA) was performed using pvca package (v1.50.0) in R with threshold: 0.6 [16]. Potential confounding factors as follows were considered; “breeds”, “age”, “sex”, and “collection setting (sample collection)” in all samples (n = 39).
Differentially methylated cytosine (DMC) selection
To divide the dataset for training and test sets, 28 individuals were designated as the training set, while the remaining 11 individuals were utilized as the test set. Lymphoma-associated DMCs were called using methylKit (v1.34.0) [17]. Dog breeds and ages of individual cases were included as covariates into the model, with a difference in beta values greater than 10% and q.value (adjusted p-value) < 0.05. Furthermore, in order to reduce false positives and identify more reliable DMCs, the non-lymphoma and lymphoma labels were randomly shuffled, and the statistics for each CpG were computed. After one hundred iterations, CpG sites detected more than once even with the permutated labels were subsequently removed from the DMC candidates.
DMR identification and clustering
DMCs located within 100 bp were combined, and the combined genomic regions including more than 5 DMCs were considered as DMRs. DMRs were visualized using heatmap.2 [18] and annotations were visualized using pheatmap [19]. 1,532 DMRs were clustered using k-means clustering. Prior to the k-means analysis, the optimal number of clusters was determined using the elbow method, which calculates the within-cluster sum of squares (WCSS) in a range of simulated cluster number (S2 Fig in S1 File).
Gene ontology enrichment analysis
Gene annotations associated with gene ontology and KEGG pathways were acquired through R packages in the following manner: “org.Cf.eg.db” by Carlson M (2022). For KEGG pathway enrichment analysis, all annotated genes (n ≈ 17,000) were used as the background gene set, as implemented in the R package “org.Cf.e.g.,db” (version 3.14.0). WGBS coverage was highly uniform across the genome, with 99.5% of genes containing at least one CpG site with sufficient coverage either within the gene body or within 10 kb of the gene boundaries. Only a small subset of genes (<1%) lacked CpG coverage in these regions, indicating minimal potential bias from uneven sequencing coverage. For Gene Ontology enrichment analysis, we utilized annotations from the “goa_dog_isoform.gaf” (version 108), which contained approximately 8,650 protein annotations [20,21]. Of these, 8,102 proteins were successfully mapped to the reference genome, and 8,080 genes (93.4%) had at least one CpG site with available coverage within 10 kb. These genes were used as the background gene set for Gene Ontology analyses. In order to identify overrepresented terms within the GO analysis, p-values were computed with the hypergeometric test and subsequently adjusted by controlling for the false discovery rate. DMR-gene associations were determined using bedtools (v2.30.0) to identify overlapping DMRs with gene bodies and to map intergenic DMRs to their closest genes.
Simple logistic regression model
The training dataset was used to develop a prediction model, while the remaining data were reserved for evaluating the constructed models. For constructing logistic regression models, we assessed the relationship between average DNA methylation across each DMR and the lymphoma/non-lymphoma status. The predicted status was modeled based on the logit transformation of the probability of having gastrointestinal lymphoma. This process was conducted for each DMR using the `lm` function in R. For a preliminary model selection step, 1,000 bootstrap samples were conducted and then the averagearea under the receiver operating characteristic curve (AUC-ROC) was calculated on the training dataset. To assess the model performance, the AU-ROC and the Area Under the Precision-Recall curve (AU-PRC) were adopted for the evaluation on the test dataset.
Prediction model construction by machine learning
For multiple logistic regression with regularization, to select informative DMRs at one step, multiple linear regression was performed to construct prediction models. To handle the multicollinearity and relatively large number of features with feasible computing resources, the FastSparse (v0.1.0) package was utilized [22]. The parameter gamma was fixed to 0.001. The best lambda was determined by 4-fold cross-validation on the training dataset. For decision tree and random forest, to select representative DMRs for distinguishing between lymphoma and non-lymphoma individuals, decision trees were constructed for each cluster and the five DMRs in total were selected. Among these selected five DMRs, the lymphoma status was predicted using random forest method.
Results
Entire pattern of DNA methylation between subjects with and without a diagnosis of canine gastrointestinal lymphoma
A total of 39 dog individuals (control:19, case:20) were enrolled from participating veterinary hospital and related facilities. Cases were dogs with veterinary-confirmed diagnosis of canine gastrointestinal lymphoma, while controls were dogs that showed no observed internal medical diseases at the time of sample collection. The mean age in years was 7.79 (standard deviation (SD): 3.66) in control, 10.2 (SD: 2.02) in case, respectively. The sex ratio of the population was female (spayed): male (castrated) was 13 (5): 6 (5) in control, 10 (7): 10 (7) in case. In terms of dog breeds, control and case populations consisted of three and fifteen breeds, respectively (Table 1).
Quality filtering yielded approximately 14.2 million CpGs for analysis, representing more than half of all CpGs in the dog reference genome. The distribution of the beta values was similar between lymphoma and non-lymphoma individuals (Fig 1A). Next, to investigate the variance of DNA methylation associated with known demographic factors, such as age, sex, breeds, and the diagnosis of lymphoma, we conducted principal component analysis (PCA) using all available CpG except CpG with the same methylation level across all samples. After scaling, lymphoma and non-lymphoma individuals were relatively separated along the first principal component (PC1) (Fig 1B). PC1 and the second principal component (PC2) explained only 5.93% and 4.55% of variance, respectively.
(A) Density plot of DNA methylation level of measured CpGs in each individual. (B) Plot of principal component analysis using all CpGs passing filtering criteria. (C-E) Projections of demographic characters; age (C), sex (D), breeds (E) on the PCA plot.
Next, because some subjects had undergone treatment, such as corticosteroids, which are a major treatment for common inflammation. Steroid treatment can affect for blood cell composition [23]. To assess the effect of the steroid treatment, we chose individuals with traceable clinical histories and projected the steroid treatment condition onto the PCA plot (Fig 1B). Although individuals with steroid treatment were located relatively on the right side of the PCA plot, the separation between individuals with and without steroid treatment was not clear. Thus, steroid treatment had no obvious correlation with the high-variance components of DNA methylation.
It is known that DNA methylation is affected by age and sex [24]. To investigate whether age, sex, and dog breeds explained these PC1 and PC2, we projected these three attributions on the PCA plots (Fig 1C-E). Age and sex had no obvious association with either PC1 or PC2. This is because the non-lymphoma group contains a relatively large number of poodle individuals. It appears that poodle breeds form a cluster. However, the same breed such as poodle or shiba were not clustered together over cases and controls. That is, breeds attribution also did not show a strong association with either PC1 or PC2. Thus, several attributes, such as steroid treatment, sex, age, breeds, were not strongly manifested at least along PC1 and PC2. On the other hand, cases and controls were relatively separated, despite the fact the PC1 and PC2 loading explained only approximately 10–11% of the total variance. Additionally, to estimate the contribution of each covariates, we performed Principal variance component analysis (PVCA). PVCA revealed sample collection setting explained only 6.4% of variance, while biological covariates contributed with limited contribution (age: 1.5%, sex: 3.4%, breed: 1.1%) (S1 Fig in S1 File). The substantial residual variance (62.1%) likely encompasses individual variation and disease-associated signals. These findings demonstrate that technical factors have minimal impact on the observed case-control separation, supporting the validity of our methylation profiling results.
Identification of differentially methylated regions
To detect differentially methylated cytosines (DMCs) that compose differentially methylated regions (DMRs), we conducted regression analysis of DNA methylation level in all each qualified CpG site against the diagnosis of canine gastrointestinal lymphoma using the R package methylKit. For predictive modeling, the 39 individuals were divided into a training dataset (28 individuals: 12 controls, 16 cases) and a test dataset (11 individuals: 7 controls, 4 cases) using a hold-out method (S1 Table in S1 File, Fig 2). This process helps to fairly evaluate constructed predictive models by reducing the risk of data leakage. Then, the training dataset was utilized to identify differentially methylated regions. Although we did not see any obvious association with either 1st or 2nd component, still to ensure the robustness of the subsequent analysis, these affect yearly ages and breeds were included as covariate for adjustment to consider the potential confound factors. Moreover, here, to control for false positive signals, the labels of case/control status were permutated 100 times, and DMC candidates that were called more than once, even under the permuted labels, were removed. This process would reduce false positive even using small sample size. Hypomethylated-DMC candidates (156,699) were reduced to 63,455 DMCs and hypermethylated DMC candidates (79,197) were reduced to 9,175, resulting in 72,630 DMCs in total. The remaining DMCs were merged if they were located within a distance of 100 bp. This process identified approximately 1,800 (1,755) regions as lymphoma-DMR candidate regions.
The dataset was randomly divided into a training set (approximately 80%) for feature selection and model development, and an independent test set (approximately 20%) for performance validation to ensure unbiased model evaluation. This approach prevents data leakage and ensures an unbiased assessment of model generalizability.
Furthermore, to exclude the CpGs that could respond to steroid treatment, steroid-DMRs were detected using methylKit without any covariate adjustment. This detection process was performed using loose criteria, which contain more than three steroid-DMCs to comprehensively identify steroid affected regions. 6,594 hypermethylated and 7,947 hypomethylated steroid-DMRs in the group with steroid treatments were detected. These genomic regions were subtracted from the lymphoma-DMR candidate. Moreover, to eliminate the potential impact of sex differences on the reliability of the lymphoma-DMRs, the lymphoma-DMRs located on X chromosome were omitted from subsequent analysis. Then, the remaining lymphoma-DMR candidates numbered 1,532 regions. DMRs distributed across the genome (Fig 3A). The difference in average DNA methylation level on each DMR between cases and controls was relatively small and most of them were around 10% in difference.
(A) Locations of lymphoma-DMRs across chromosomes. (B) Heatmap of average methylation level across identified DMRs by each cluster. Clustering and feature selection were performed on the training set only. The heatmap displays all samples to demonstrate cluster consistency; however, model evaluation was conducted exclusively on the independent test set.
To obtain the overview and trends in change pattern of DNA methylation of these DMRs, we performed k-mean clustering using the DNA methylation level of lymphoma-DMRs. The number of DMRs in each cluster was as follows; 917 DMRs for cluster_1, 180 DMRs for cluster_2, 326 DMRs for cluster_3, and 109 DMRs for cluster_4. Only cluster_2 showed a trend of hypermethylated in the lymphoma group and all other three clusters showed hypomethylated patterns (Fig 3B). The methylation trends between cluster_1 and cluster_3 were similar, but some DMRs in cluster_3 had larger difference in DNA methylation level compared to that in cluster_1. Thus, hypomethylated DMRs were predominant compared to the number of hypermethylated DMRs. Taken together, these identified DMRs can be used to discriminate between individuals with lymphoma and those without lymphoma.
Profile of genes adjacent to DMRs
Among these 1,532 DMRs, 1024 DMRs overlapped gene regions (802 genes), 63 DMRs were located in promoter regions of 59 genes (within 1.5 kb of upstream genes), and 278 DMRs located within 10 kb of at 249 genes (Datasheet S2 File). These DMR adjacent or overlapped genes included those involved in blood cell development and differentiation, such as CCR7, CD27, PRDM1, GFI1 and FCMR in other mammals [25–28]. The DNA methylation status at representative loci was visualized in Fig 4A-D. Enrichment analysis using gene ontology showed a relatively limited number of enriched GO terms. Among genes with DMRs at their promoter regions, only ‘extracellular space (GO:0005615)’ was enriched (S2 Table in S1 File). Among DMR-adjacent genes (located within 10 kb), ‘tumor necrosis factor receptor binding (GO:0005164)’ was enriched (S3 Table in S1 File). For genes overlapped with DMRs, there was no strong enrichment of GO term. Thus, DMR-adjacent genes may be related to immune responses, suggesting a systemic response to pathology.
(A-D) DMRs are shown as rectangle in corresponding colors to the colors of the cluster in Fig 3B; cluster_1 (A), cluster_2 (B), cluster_3 (C), and cluster_4 (D). The height of each bar shows the average of the DNA methylation level on CpG in the control group or the case group.
Status prediction by single DMRs
To identify potential biomarker candidates, first, we adopted simple logistic regression for each DMR to construct simple models using the training dataset. Next, we applied bootstrapping statistics as the preliminary model selection from 1,532 models constructed on the training dataset and then calculated the average area under the receiver operating characteristic curve (AU-ROC) and the confidential intervals. AU-ROC indicates the ability to discriminate between both positive and negative cases.
The models were evaluated using two metrics: the AU-ROC and the area under the precision-recall curve (AU-PRC). The resultant models were evaluated on the test dataset consisting of 11 individuals, which included 7 subjects without lymphoma and 4 subjects with lymphoma. AU-PRC primarily shows the ability to discriminate among positive cases, making it particularly beneficial for datasets with uneven case/control distributions. Here, we focused on the top twenty DMRs with the highest AU-ROC on the training dataset. These top twenty DMRs contained both hypermethylated and hypomethylated regions (S3 Fig in S1 File). Among them, 14 models based on single DMR perfectly predicted the outcomes with both AU-ROC and AU-PRC equal to 1 (Table 4). However, when these twenty models were tested on the test dataset, these twenty model’s values varied, ranging from 0.643 to 1.000 for AU-ROC and from 0.407 to 1.000 for AU-PRC. Taken together, some simple models consisting of single DMRs had good exhibited predictive power. For example, the model using chr6:15948237–15948396 achieved AU-ROC:1.000 and AU-PRC:1.000 on the test dataset. This model is a high potential to discriminate between case and control. On the other hand, because we assessed a large number of DMRs, we cannot exclude the possibility that the favorable results may be coincidental. While some DMRs showed promising predictive power, it is important to develop robust models to ensure consistent performance across different populations of subjects.
Prediction models comprising optimal multiple probes
While a part of simple linear regression models provided accurate predictions with single predictors, DNA methylation levels can be influenced by various intrinsic and extrinsic factors, such as age and environmental stresses. These factors may compromise the reliability of the results. To reflect multiple factors that influence DNA methylation, we constructed predictive models that consist of multiple probes (DMRs as predictors). However, to keep cost manageable, it is important to limit the number of probes used. This approach allows for targeted analysis without the high expenses of genome-wide approach; for example, collecting data from several targeted loci using PCR or locus-specific sequencing is significantly cheaper than a genome-wide approach. Given these considerations, we aimed to narrow down the number of probes to a practical level. Although predictive models with more probes tend to provide more information, the cost of examining the models makes it advantageous to use smaller subsets of probes. This approach not only facilitates practical applications but also helps avoid overfitting. Here, to select effective probes and construct prediction models, we used two different machine learning approaches and compared their performances; 1) multi-linear regression with regularization and 2) feature selection based on heuristics, utilizing decision tree that can reflect potential non-linear structure.
First, we adopted a model using multiple logistic regression with regularization. Typically, the multiple regression approach suffers from multicollinearity; in particular, the DNA methylation level also showed similarities between different DMRs (Fig 3B). We adopted the Fast Sparse approach, which can overcome multicollinearity and achieve feature selection using L0 and L2 regularization. L0 and L2 regularization is advantageous for high-dimension data, such as genome-wide data. Using 4-fold cross-validation, we determined the best value of hyper-parameter lambda for regularization (model 1). This method selected 13 DMRs, and the evaluation of this model yielded AU-ROC: 0.893 and AU-PRC: 0.884 on the test dataset (Fig 5A, C).
(A, B) Shaded line plot across DMR regions identified by FastSparse (A) (model 1) and by decision tree from each cluster (B) (model 2). Green lines indicate the means of the DNA methylation level in controls. Blue lines indicate the means of that in cases. Shaded regions around lines indicates standard deviations. Means and standard deviations were calculated based on the total of the training dataset and the test dataset for the purpose of visualization. (C-D) ROC Curve and the precision-recall curve for model 1 (C) and model 2 (D).
Turning to a different approach, we also tried feature selection based on the results of k-means clustering (Fig 3B). In this approach, we performed feature selection using decision tree methods in each cluster to avoid multicollinearity. We adapted selected DMRs for an interpretable modeling structure, including non-linear model, to account for potential hierarchical structure, as hierarchical structures were assumed to exist over genomes, such as genetic epistasis. We performed decision tree analyses to choose representative predictors from each of the four clusters of DMRs, reflecting different aspects of the methylation status of the gastrointestinal lymphoma group. This yielded one or two DMRs from each cluster, resulting in a total of 5 DMRs selected. Altering the random seed did not affect the selected subset of DMRs. These DMRs contain both hypermethylated and hypomethylated regions. Furthermore, by utilizing these five DMRs, we performed random forest analysis to model the prediction of having a gastrointestinal lymphoma (model 2). The evaluation of this model yielded an AU-ROC: 0.929 and an AU-PRC: 0.909 on the test dataset (Fig 5B, D). These results suggest that models consisting of multi-probes also discriminate cases from controls relatively well, although the predictions were not perfect.
Discussions
In this study, we investigated whole genome DNA methylation using whole blood from dogs to distinguish gastrointestinal lymphoma cases from non-lymphoma controls. Unexpectedly, we observed differences in DNA methylation trends between cases and controls (Fig 1). Overall patterns showed a tendency for clustering of each group along PC1 and PC2. However, the boundaries between cases and controls were not clearly defined (Fig 1). These results suggest that DNA methylation from whole blood could be used to monitor the health status of dogs. We obtained 1,532 DMRs from the WGBS analysis (Datasheet S2 File), although the whole blood assessed was an indirect tissue that did not include lesions.
However, the causality of DNA methylation alteration in individuals with canine gastrointestinal lymphoma remains unclear. One possible explanation is that DNA methylation reflects the systemic response to lymphoma. Supporting this hypothesis, some calcium binding proteins tend to be dysregulated in human pan-cancer [29], and two of these proteins were adjacent to DMRs in our dataset. Similarly, several genes adjacent to DMRs in our dataset represent canine homologs of human cancer-associated genes: TNFSF13 is known to be related to tumor cell proliferation in human studies [30], while RUNXs and MPZL2 are associated with leukemia in human [31,32]. In line with this, some immune response-related genes were adjacent to DMRs (Datasheet S2 File). This observation is similar to findings of human studies, where blood DNA methylation signatures are mainly derived from epigenetic change in immune and inflammatory cells [8,33,34]. At least, here, we excluded potentially steroid-affected genomic regions to avoid influencing of steroid treatment on the prediction in this study. However, like many other blood markers, DNA methylation profile may not serve as direct indicators of the disease, but rather as indirect indicators that reflect a variety of factors and conditions, such as gastrointestinal inflammations or other types of tumors. Therefore, it is important to evaluate them in conjunction with other clinical information for an applied assessment.
So far, several studies have reported the DNA methylation status of DMRs between affected tissues with pathological features and normal or surrounding tissues in several different types of canine lymphoma [35–39]. This study contrasts with these conventional approaches by aiming non/low-invasive assay using whole blood samples rather using biopsy tissue samples from lesions. DNA methylation pattern reflects cell composition, which could depend on subject’s health status. Indeed, an increase in leukocyte composition is sometimes observed in lymphoma cases [40]. Due to the lack of reference data on DNA methylation patterns of each cell type in dogs, we could not assess cell composition. While we do not exclude the possibility that the DNA methylation difference derived from alterations in cell composition, it is noteworthy that DNA methylation in whole blood has a signature that can discriminate lymphoma cases from controls, including a potential of cell composition inference. Future research could benefit from considering cell compositions to better understanding of the mechanism of DNA methylation changes.
We also challenged high-dimensional features with only a small number of samples in selecting good predictors from the detected DMRs. Simple logistic regression identified relatively good predictors, although some of the predictors showed poor accuracy on the test dataset. A concern with simple logistic regression models is the presence of random variation during evaluation processes, which may lead some models to yield favorable results on the test dataset purely by chance. This might be caused by the small sample size of the dataset, which can lead to overfitting. For future analyses, it is essential to conduct an independent validation study. To mitigate the risk DNA methylation changes in the specific predictors due to unintended factors, we also designed the model composed of multiple predictors (Fig 5). In the minimum case, five predictors showed relatively good score of gastrointestinal lymphoma prediction. For validation studies and construction of better models, a larger number sample size might improve these models in future. Thus, we selected predictive genomic regions for affected individuals using machine learning approaches and identified promising regions. This data will be leveraged for future panel design, utilizing these data will enable broader applications, such as assessment of disease subtype, tumor stage evaluation, or monitoring of general health status.
Supporting information
S1 File. Supplementary Figures and Tables.
Fig S1. Principal Variance Component Analysis (PVCA) showing variance components explained by each potential covariate. The analysis was performed to evaluate possible confounding effects of unmodeled variables. Fig S2. Total within clusters sum of squared error in different number of clusters (from 2 to 10 clusters). Fig S3. Jitter plots of average methylation level in DMRs identified by decision tree from each cluster. Table S1. Demographic statistics of dog individuals in the train and the test datasets. Table S2. Enrichment analysis of gene ontology (GO) on the genes set having DMRs in their promoter. Table S3. Enrichment analysis of gene ontology (GO) on the genes adjacent to DMRs within 10 kbp Data sheet 1 Identified differentially methylated regions and adjacent genes.
https://doi.org/10.1371/journal.pone.0339388.s001
(PDF)
References
- 1. Christensen BC, Houseman EA, Marsit CJ, Zheng S, Wrensch MR, Wiemels JL, et al. Aging and environmental exposures alter tissue-specific DNA methylation dependent upon CpG island context. PLoS Genet. 2009;5(8):e1000602. pmid:19680444
- 2. Jin Z, Liu Y. DNA methylation in human diseases. Genes Dis. 2018;5(1):1–8. pmid:30258928
- 3. Salameh Y, Bejaoui Y, El Hajj N. DNA Methylation Biomarkers in Aging and Age-Related Diseases. Front Genet. 2020;11:171. pmid:32211026
- 4. Younesian S, Mohammadi MH, Younesian O, Momeny M, Ghaffari SH, Bashash D. DNA methylation in human diseases. Heliyon. 2024;10(11):e32366. pmid:38933971
- 5. Wagner W. How to Translate DNA Methylation Biomarkers Into Clinical Practice. Front Cell Dev Biol. 2022;10:854797. pmid:35281115
- 6. Zheng Y, Joyce BT, Colicino E, Liu L, Zhang W, Dai Q, et al. Blood Epigenetic Age may Predict Cancer Incidence and Mortality. EBioMedicine. 2016;5:68–73. pmid:27077113
- 7. Kresovich JK, Xu Z, O’Brien KM, Weinberg CR, Sandler DP, Taylor JA. Methylation-Based Biological Age and Breast Cancer Risk. J Natl Cancer Inst. 2019;111(10):1051–8. pmid:30794318
- 8. Balnis J, Madrid A, Drake LA, Vancavage R, Tiwari A, Patel VJ, et al. Blood DNA methylation in post-acute sequelae of COVID-19 (PASC): a prospective cohort study. EBioMedicine. 2024;106:105251. pmid:39024897
- 9. Baylin SB, Jones PA. Epigenetic Determinants of Cancer. Cold Spring Harb Perspect Biol. 2016;8(9):a019505. pmid:27194046
- 10. Ibrahim J, Op de Beeck K, Fransen E, Peeters M, Van Camp G. Genome-wide DNA methylation profiling and identification of potential pan-cancer and tumor-specific biomarkers. Mol Oncol. 2022;16(12):2432–47. pmid:34978357
- 11. Vidal E, Sayols S, Moran S, Guillaumet-Adkins A, Schroeder MP, Royo R, et al. A DNA methylation map of human cancer at single base-pair resolution. Oncogene. 2017;36(40):5648–57. pmid:28581523
- 12. Nishiyama A, Nakanishi M. Navigating the DNA methylation landscape of cancer. Trends Genet. 2021;37(11):1012–27. pmid:34120771
- 13. Wan JCM, Massie C, Garcia-Corbacho J, Mouliere F, Brenton JD, Caldas C, et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat Rev Cancer. 2017;17(4):223–38. pmid:28233803
- 14. Nakamura M, Matsumoto Y, Yasuda K, Nagata M, Nakaki R, Okumura M, et al. Unraveling the DNA methylation landscape in dog blood across breeds. BMC Genomics. 2024;25(1):1089. pmid:39548380
- 15. Schultz MD, He Y, Whitaker JW, Hariharan M, Mukamel EA, Leung D, et al. Human body epigenome maps reveal noncanonical DNA methylation variation. Nature. 2015;523(7559):212–6. pmid:26030523
- 16. Bushel P. Pvca. Bioconductor. 2017.
- 17. Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, Melnick A, et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol. 2012;13(10):R87. pmid:23034086
- 18. Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, et al. gplots: Various R Programming Tools for Plotting Data. CRAN: Contributed Packages. The R Foundation. 2005.
- 19. Kolde R. pheatmap: Pretty Heatmaps. CRAN: Contributed Packages. The R Foundation. 2010.
- 20. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9. pmid:10802651
- 21. Gene Ontology Consortium, Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, et al. The Gene Ontology knowledgebase in 2023. Genetics. 2023;224(1):iyad031. pmid:36866529
- 22. Liu J, Zhong C, Seltzer M, Rudin C. Fast Sparse Classification for Generalized Linear and Additive Models. Proc Mach Learn Res. 2022;151:9304–33. pmid:35601052
- 23. Ekstrand C, Pettersson H, Gehring R, Hedeland M, Adolfsson S, Lilliehöök I. Prednisolone in Dogs-Plasma Exposure and White Blood Cell Response. Front Vet Sci. 2021;8:666219. pmid:34179161
- 24. Rubbi L, Zhang H, Feng J, He C, Kurnia P, Ratan P, et al. The effects of age, sex, weight, and breed on canid methylomes. Epigenetics. 2022;17(11):1497–512. pmid:35502722
- 25. Fritsch RD, Shen X, Sims GP, Hathcock KS, Hodes RJ, Lipsky PE. Stepwise differentiation of CD4 memory T cells defined by expression of CCR7 and CD27. J Immunol. 2005;175(10):6489–97. pmid:16272303
- 26. Boi M, Zucca E, Inghirami G, Bertoni F. PRDM1/BLIMP1: a tumor suppressor gene in B and T cell lymphomas. Leuk Lymphoma. 2015;56(5):1223–8. pmid:25115512
- 27. van der Meer LT, Jansen JH, van der Reijden BA. Gfi1 and Gfi1b: key regulators of hematopoiesis. Leukemia. 2010;24(11):1834–43. pmid:20861919
- 28. Kubagawa H, Honjo K, Ohkura N, Sakaguchi S, Radbruch A, Melchers F, et al. Functional Roles of the IgM Fc Receptor in the Immune System. Front Immunol. 2019;10:945. pmid:31130948
- 29. Liang X, Huang X, Cai Z, Deng Y, Liu D, Hu J, et al. The S100 family is a prognostic biomarker and correlated with immune cell infiltration in pan-cancer. Discov Oncol. 2024;15(1):137. pmid:38684596
- 30. Nowacka KH, Jabłońska E. Role of the APRIL molecule in solid tumors. Cytokine Growth Factor Rev. 2021;61:38–44. pmid:34446365
- 31. Otálora-Otálora BA, Henríquez B, López-Kleine L, Rojas A. RUNX family: Oncogenes or tumor suppressors (Review). Oncol Rep. 2019;42(1):3–19. pmid:31059069
- 32. Eshibona N, Giwa A, Rossouw SC, Gamieldien J, Christoffels A, Bendou H. Upregulation of FHL1, SPNS3, and MPZL2 predicts poor prognosis in pediatric acute myeloid leukemia patients with FLT3-ITD mutation. Leuk Lymphoma. 2022;63(8):1897–906. pmid:35249471
- 33. Bergstedt J, Azzou SAK, Tsuo K, Jaquaniello A, Urrutia A, Rotival M, et al. The immune factors driving DNA methylation variation in human blood. Nat Commun. 2022;13(1):5895. pmid:36202838
- 34. Li KY, Tam CHT, Liu H, Day S, Lim CKP, So WY, et al. DNA methylation markers for kidney function and progression of diabetic kidney disease. Nat Commun. 2023;14(1):2543. pmid:37188670
- 35. Ohta H, Yamazaki J, Jelinek J, Ishizaki T, Kagawa Y, Yokoyama N, et al. Genome-wide DNA methylation analysis in canine gastrointestinal lymphoma. J Vet Med Sci. 2020;82(5):632–8. pmid:32213750
- 36. Hsu C-H, Tomiyasu H, Lee J-J, Tung C-W, Liao C-H, Chuang C-H, et al. Genome-wide DNA methylation analysis using MethylCap-seq in canine high-grade B-cell lymphoma. J Leukoc Biol. 2021;109(6):1089–103. pmid:33031589
- 37. Ferraresso S, Aricò A, Sanavia T, Da Ros S, Milan M, Cascione L, et al. DNA methylation profiling reveals common signatures of tumorigenesis and defines epigenetic prognostic subtypes of canine Diffuse Large B-cell Lymphoma. Sci Rep. 2017;7(1):11591. pmid:28912427
- 38. Chu S, Avery A, Yoshimoto J, Bryan JN. Genome wide exploration of the methylome in aggressive B-cell lymphoma in Golden Retrievers reveals a conserved hypermethylome. Epigenetics. 2022;17(13):2022–38. pmid:35912844
- 39. Teoh YB, Ishizaki T, Kagawa Y, Yokoyama S, Jelinek J, Matsumoto Y, et al. Use of genome-wide DNA methylation analysis to identify prognostic CpG site markers associated with longer survival time in dogs with multicentric high-grade B-cell lymphoma. J Vet Intern Med. 2024;38(1):316–25. pmid:38115210
- 40. Stefaniuk P, Szymczyk A, Podhorecka M. The Neutrophil to Lymphocyte and Lymphocyte to Monocyte Ratios as New Prognostic Factors in Hematological Malignancies – A Narrative Review. CMAR. 2020;Volume 12: 2961–77.