Figures
Abstract
Tabular data is the predominant format for statistical analysis and machine learning across domains such as finance, biomedicine, and environmental sciences. However, conventional methods often face challenges when dealing with high dimensionality and complex nonlinear relationships. In contrast, deep learning models, particularly Convolutional Neural Networks (CNNs), are well-suited for automatic feature extraction and achieve high predictive accuracy, but are primarily designed for image-based inputs. This study presents a comparative evaluation of non-Euclidean distance metrics within the Image Generator for Tabular Data (IGTD) framework, which transforms tabular data into image representations for CNN-based classification. While the original IGTD relies on Euclidean distance, we extend the framework to adopt alternative metrics, including one minus correlation, Geodesic distance, Jensen-Shannon distance, Wasserstein distance, and Tropical distance. These metrics are designed to better capture complex, nonlinear relationships among features. Through systematic experiments on both simulated and real-world genomics datasets, we compare the performance of each distance metric in terms of classification accuracy and structural fidelity of the generated images. The results demonstrate that non-Euclidean metrics can significantly improve the effectiveness of CNN-based classification on tabular data. By enabling a more accurate encoding of feature relationships, this approach broadens the applicability of CNNs and offers a flexible, interpretable solution for high-dimensional, structured data across disciplines.
Citation: Lin Y-R, Wu H-M (2026) Image generator for tabular data based on non-Euclidean metrics for CNN-based classification. PLoS One 21(1): e0340005. https://doi.org/10.1371/journal.pone.0340005
Editor: Ruriko Yoshida, Naval Postgraduate School, UNITED STATES OF AMERICA
Received: August 3, 2025; Accepted: December 15, 2025; Published: January 9, 2026
Copyright: © 2026 Lin, Wu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the results presented in the study are available from CRAN and UCI Machine Learning Repository.
Funding: This work was supported by the the National Science and Technology Council, Taiwan (NSTC 113-2813-C-004-005-M). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors declare no competing interests.
Introduction
In recent years, deep learning has achieved remarkable advancements, with Convolutional Neural Networks (CNNs) [1,2] demonstrating exceptional predictive performance across a wide range of tasks, including object detection, image classification, and natural language processing. CNNs are specifically designed to process grid-like data structures, such as images, leveraging efficient feature extraction, the identification of nonlinear correlations and higher-order patterns, and a compact architecture enabled by weight sharing. Despite their success in image-based tasks, CNNs face significant challenges when applied to tabular data, which consists of rows representing observations and columns corresponding to variables (features). Unlike images, tabular data lacks an inherent spatial structure, making it difficult for CNNs to leverage their spatial feature extraction capabilities effectively. Moreover, CNNs often suffer from interpretability issues, limiting their application in fields where explainability is crucial, such as healthcare, finance, and scientific research.
Traditional statistical methods and machine learning techniques, such as Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), and Support Vector Machines (SVM), have long served as foundational approaches for analyzing and predicting tabular data. These methods typically involve fitting statistical models based on assumptions about the underlying structure and relationships between variables. While effective in certain scenarios, they often struggle to capture complex nonlinear interactions and high-dimensional feature dependencies, leading to suboptimal feature extraction and reduced predictive accuracy. In contrast, deep learning models, particularly CNNs, have demonstrated superior performance in learning intricate feature representations but are inherently designed for spatial data rather than structured tabular datasets.
To bridge this gap and leverage the power of CNNs for tabular data analysis, recent research has explored tabular-to-image conversion techniques, which transform high-dimensional tabular data into image-like representations (referred to as pseudo-images in this study). This transformation enables CNNs to process tabular datasets more effectively by encoding meaningful relationships between features into spatial representations. As a result, these techniques have demonstrated improved predictive accuracy and enhanced analytical capabilities (e.g., [3]). A brief review of these methods is provided in the next section.
The growing significance of tabular-to-image conversion techniques highlights their potential to integrate deep learning with structured data across various domains. If effectively adapted, CNNs could offer substantial benefits for tabular data classification tasks. Building upon this foundation, this study seeks to enhance the Image Generator for Tabular Data (IGTD) [4] by addressing its key limitations and extending its applicability to CNN-based classification. IGTD relies on Euclidean distance to organize tabular features within a 2D grid. However, this approach does not adequately consider the high-dimensional characteristics, nonlinear structures, and complex feature distributions present in tabular data. To address these limitations, we introduce four alternative non-Euclidean distance metrics, one minus correlation, Geodesic distance, Jensen-Shannon distance, Wasserstein distance, and Tropical distance, to more effectively capture the underlying structure of high-dimensional data. By adopting these metrics, our study aims to enhance both the interpretability and predictive performance of CNN-based classification models for tabular data.
This article proceeds as follows: A review of related work on tabular-to-image conversion methods is first presented. We then describe the IGTD algorithm, followed by our proposed improvements using non-Euclidean distance metrics. The empirical evaluation, covering both simulation studies and real-world datasets, is presented next. Finally, we conclude by summarizing the key findings and discussing their broader implications.
Advancements in tabular-to-image conversion for CNN-based predictions
The transformation of tabular data into image representations suitable for deep learning has gained increasing attention in recent years. This approach enables Convolutional Neural Networks (CNNs) and other image-based deep learning architectures to be directly applied to structured tabular datasets, eliminating the need for modifications to CNN architectures. One of the key advantages of tabular-to-image conversion is its ability to preserve spatial relationships between variables, allowing CNNs to leverage their hierarchical feature extraction capabilities. When tabular data is encoded as pseudo-images, CNNs can effectively extract complex patterns and inter-variable relationships, potentially improving classification accuracy. Furthermore, this transformation facilitates the application of additional deep learning algorithms designed for image data, such as deep neural networks (DNNs), transfer learning models, and autoencoders. However, redesigning CNN architectures specifically for tabular data, as explored by [5–8], is beyond the scope of this study. Instead, this work focuses on leveraging tabular-to-image conversion techniques to enhance CNN-based classification while preserving the structural integrity of the original data.
The early exploration of tabular-to-image conversion for CNNs was pioneered by Lyu and Haque (2018) [9], who mapped high-dimensional RNA-Seq data into two-dimensional images by aligning genes according to their chromosomal positions. Their approach demonstrated high classification accuracy when processed by CNNs. Similarly, Ma and Zhang (2018) [10] introduced OmicsMapNet, which employed the Treemap algorithm [11] to hierarchically structure genomic data into 2D images. However, this method required prior biological knowledge to spatially organize molecular features into meaningful patterns. A major milestone in this domain was the introduction of DeepInsight [12], a generalizable framework for converting non-image data, such as RNA-Seq, text, and artificial datasets, into images. Unlike its predecessors, DeepInsight leveraged CNNs without requiring domain-specific knowledge for feature arrangement. Building on this progress, Bazgir et al. (2020) [13] proposed REFINED-CNN (Representation of Features as Images with Neighborhood Dependencies), which employed Bayesian Metric Multidimensional Scaling (BMDS) to transform tabular features into compact image representations while preserving their spatial relationships.
Subsequent advancements focused on enhancing interpretability. PathCNN [14] integrated multi-omics data and pathway information using PCA, while SurvCNN [15] applied nonlinear dimensionality reduction techniques such as t-SNE and UMAP to reduce image dimensions. Zhu et al. (2021) [4] introduced IGTD (Image Generator for Tabular Data), a technique designed to preserve the neighborhood structure of data features by arranging them using Euclidean distances. IGTD optimizes variable placement by minimizing Euclidean distances before projecting them onto a two-dimensional grid, improving classification accuracy when applied to CNNs. This study highlighted the significant impact of distance metrics on feature arrangement and classification outcomes, motivating further exploration of alternative distance metrics to enhance predictive performance within the IGTD framework. Further refinements in tabular-to-image conversion include Vec2Image [16], which transforms high-dimensional biological omics data into 2D images while incorporating feature abundance and correlation. Wang et al. (2022) [17] proposed a multi-index sorting method for gene expression data, enabling image-based survival prediction in lung cancer patients and extending the applicability of these techniques beyond classification tasks. Tab2Vox [18] introduced a novel approach that converts tabular data into 3D voxel images, allowing 3D CNNs to capture hierarchical feature representations.
Recent contributions have continued to refine these methodologies. TINTO [19] transforms structured tabular data into images using PCA and t-SNE while incorporating fuzzification techniques to enhance feature extraction and generalization. TablEye [20] addresses the challenge of few-shot learning in tabular data by integrating tested few-shot learning algorithms with embedding functions, enabling effective learning with limited labeled data. MWCapsNet [21] employs multi-level wavelet decomposition and capsule networks to extract multi-scale spatial and frequency features, improving classification performance in imbalanced datasets. Binary Image Encoding (BIE) [22] transforms binary network traffic data into images using one-hot encoding, mitigating the challenges posed by the lack of inherent order in categorical features. Matsuda et al. (2024) [23] classified tabular-to-image conversion methods into three categories: prior knowledge-based, feature permutation-based, and dimensionality reduction-based approaches. They also identified key limitations, including a lack of consideration for prediction loss during image generation, potential non-interpretability, and the inclusion of irrelevant features. To address these challenges, they proposed HACNet, a hard attention-based tabular-to-image converter that selectively filters important variables before transformation, significantly improving interpretability and predictive performance compared with existing methods such as DeepInsight, REFINED-CNN, and IGTD.
Other studies have extended tabular-to-image conversion techniques to time series and survival analysis applications (e.g., [15,24–29]). These techniques typically convert time series data into recurrence plots, Gramian angular fields, or Markov transition fields, effectively encoding both local and global temporal relationships in a visual format to leverage CNNs for pattern recognition and forecasting.
Recent efforts have also focused on optimizing spatial representations and feature encoding to further enhance CNN-based classification. Notable examples include MRep-DeepInsight [30], which dynamically maps feature vectors using multiple manifold techniques to improve model robustness; TabMap [31], which encodes feature values as pixel intensities to create spatially semantic topographic maps; and Tensorized Image Generator (TIG) [32], which emphasizes feature relationships using intersecting diagonal lines on a grid. OmicsFootPrint [33] presents an innovative approach to transforming omics data into circular images organized by genomic locations, offering customizable and visually interpretable representations.
Additionally, several methods have focused on feature encoding strategies to enhance classification outcomes. Tab2Visual [34] represents features using vertical bars with proportional widths and colors, incorporating novel augmentation techniques such as elastic distortion and morphological operations. Table2Image [35] employs an autoencoder architecture to generate realistic and interpretable images while preserving the original characteristics of tabular data. Similarly, AutoIRAD [36] integrates dimensionality reduction and dataset-specific image representation to facilitate robust classification across datasets with varying dimensions. Other studies have explored advanced preprocessing techniques and distance metrics to further optimize classification performance. For instance, LM-IGTD [37] enhances low-dimensional tabular data representation by introducing stochastic noise and dynamically adjusting feature dimensions. Fuzzy Convolutional Neural Network (FCNN) [38] applies fuzzification techniques to map feature values into fuzzy memberships on an image canvas, improving CNN-based classification in uncertain or imprecise data environments. Finally, we note that Jiang et al. (2025) [39] have recently conducted a comprehensive survey and empirical benchmarking of representation learning for tabular data, offering valuable insights into the relative strengths of existing tabular-to-image conversion strategies.
Despite the significant advancements in tabular-to-image conversion techniques for CNN-based classification, several challenges remain. Most existing methods rely on Euclidean distance-based feature arrangements, which may not effectively capture the complex, nonlinear structures present in high-dimensional tabular datasets. Exploring alternative non-Euclidean distance metrics, such as one minus correlation, Geodesic, Jensen-Shannon, Wasserstein distances, or Tropical distance, may provide more accurate and meaningful spatial representations, ultimately improving both model interpretability and classification performance.
The image generator for tabular data (IGTD)
Zhu et al. (2021) [4] introduced IGTD (Image Generator for Tabular Data), a novel approach for converting tabular data into image representations, enabling deep learning with Convolutional Neural Networks (CNNs) without requiring domain-specific knowledge. IGTD constructs a two-dimensional grid, where each cell corresponds to a specific data point in the table. The grid is then populated with color intensities proportional to the values of the corresponding data points. To optimize the grid layout, IGTD assigns features to pixels in a manner that minimizes the discrepancy between the ranking of distances among features in the original tabular space and the ranking of distances among their assigned pixel positions in the generated image. By ensuring that similar features are positioned close to each other, IGTD enhances the ability of CNNs to identify patterns and correlations within the data.
Given a tabular dataset X, where , each row
represents a sample, and each column Xj corresponds to a feature variable. The objective of tabular-to-image conversion is to transform each row
into an image of size Nr
Nc, where Nr and Nc denote the number of rows and columns in the image, respectively, satisfying the constraint
. We outline the step-by-step procedure of IGTD in the following.
- Computing the Rank Matrix of Feature Distances, R:
To construct the rank matrix, pairwise distances between features are computed using a selected distance measure, such as the Euclidean distance. These pairwise distances are then ranked in ascending order, ensuring that smaller distances are assigned smaller ranks while larger distances receive larger ranks. The resulting pp rank matrix, denoted as R, is formed, where each element rij at the ith row and jth column represents the rank of the distance between the ith and jth features. To maintain consistency, the diagonal elements of R are set to zero.
- Computing the Rank Matrix of Pixel Distances, Q:
For an image of size NrNc, the pairwise distances between pixels are computed based on their pixel coordinates, using a chosen distance measure such as the Euclidean distance. These pairwise pixel distances are then ranked in ascending order, ensuring that smaller distances receive smaller ranks while larger distances are assigned larger ranks. The resulting p
p rank matrix of pixel distances, denoted as Q, is constructed, where each element qij represents the rank of the distance between pixel i and pixel j. To maintain consistency, the main diagonal of Q is set to zero, and Q is ensured to be symmetric. The ordering of pixels in Q follows a row-wise concatenation, where pixels in the image are sequentially arranged row by row.
- Optimizing Feature Arrangement by Minimizing the Difference Between R and Q:
To achieve an optimal feature-to-pixel mapping, the rows and columns of R are permuted synchronously to minimize the squared difference between R and Q. The optimization process seeks to minimize the following error function:where
represents the total squared error between the two rank matrices. By iteratively searching for suitable feature swaps and reordering the rows and columns of R, the difference between the two matrices is minimized. The optimized feature distance ranking matrix after this process is denoted as
.
- Generating Pseudo-images for Each Sample:
After obtaining the optimized feature distance ranking matrix, each ith feature (corresponding to the ith row and column in
) is assigned to the ith pixel (the ith row and column in Q). Using these assigned feature locations within the Nr
Nc pixel matrix, n pseudo-images are generated, where each image corresponds to a sample in the dataset. This process ensures that the spatial arrangement of features in the pseudo-images preserves meaningful relationships, allowing CNNs to effectively learn patterns from the transformed data.
IGTD has demonstrated substantial versatility across diverse application domains. Nazarri, Yusof, and Almohammedi (2023) [40] applied IGTD to cybersecurity by analyzing real-world network traffic datasets. By transforming these datasets into images and utilizing CNN models for classification, they achieved approximately 80% accuracy in intrusion detection, highlighting IGTD’s potential in identifying cyber threats. Similarly, Hosseini and Chitsaz (2023) [41] leveraged IGTD in the automotive industry to assess the knock probability of turbocharged gasoline engines under real driving conditions. Converting the collected datasets into images and processing them with CNNs resulted in a knock detection accuracy of 89.3%, demonstrating IGTD’s effectiveness in engine diagnostics and performance monitoring.
IGTD based on non-Euclidean metric
IGTD employs the Euclidean metric to measure dissimilarities between features. Euclidean distance is sensitive to absolute differences and assumes an isotropic space. It is most appropriate when features are independent and have equal variance. It is widely used in clustering tasks and in image retrieval systems where features are raw pixel intensities. Another potential limitation of Euclidean distance is its inherent assumption that features are continuous and linearly related, which may not always hold, particularly in cases involving nonlinear, high-dimensional, or unevenly distributed features. To address these limitations, this study explores alternative distance metrics, one minus correlation (RD), Geodesic distance (GD), Jensen-Shannon distance (JD), Wasserstein distance (WD), and Tropical distance (TD), which may offer more suitable representations under specific conditions. In the following, we provide a detailed discussion of these four distance metrics and their applicability to tabular-to-image conversion in IGTD.
It is important to note that the proposed non-Euclidean metrics are exclusively applied to the computation of the Feature Distance Rank Matrix (), which captures the dissimilarity between feature vectors in the original tabular space. The Pixel Distance Rank Matrix (
) continues to be calculated using the Euclidean distance, as the target representation is a fixed two-dimensional image grid where spatial proximity is inherently Euclidean. Consequently, the optimization process seeks to map the intrinsic non-Euclidean relationships captured in
onto the Euclidean layout of
.
One minus correlation coefficient as a distance (RD)
The one minus correlation coefficient is a widely used distance metric in bioinformatics, particularly for measuring gene expression similarity [42]. Derived from the Pearson correlation coefficient, which quantifies the linear relationship between two variables, say X and Y, it transforms correlation into a dissimilarity measure as , ensuring larger values indicate greater feature dissimilarity.
A key advantage of one minus correlation coefficient is its scale invariance, making it ideal for biological datasets where gene expression levels vary. It remains robust to differences in magnitude, making it useful for clustering and classification by effectively capturing linear dependencies [43]. Additionally, it is computationally efficient, allowing fast calculations in large datasets. Beyond bioinformatics, one minus correlation coefficient is applied in time-series analysis, machine learning, and clustering, playing a critical role in pattern recognition and feature analysis.
Despite its benefits, one minus correlation coefficient has limitations. It only captures linear relationships, making it ineffective for nonlinear dependencies. It is sensitive to outliers, which can distort similarity measures and introduce bias. Moreover, it ignores absolute differences in magnitude, which is problematic in fields like medical diagnostics and financial risk analysis, where similar trends but vastly different values must be distinguished. Additionally, one minus correlation coefficient is not a true distance metric, as it fails the triangle inequality, limiting its use in certain distance-based algorithms and clustering methods.
Geodesic distance (GD)
Euclidean distance often misrepresents true proximity when data lies on a curved or nonlinear manifold, as it assumes a flat, linear space. In contrast, Geodesic distance accounts for the manifold’s intrinsic geometry by measuring the shortest path along its surface between two points. This makes it especially effective in capturing meaningful relationships within data embedded in complex, curved spaces. By respecting the underlying structure of the space, Geodesic distance offers a more accurate and informative measure of similarity than traditional linear metrics. However, a key limitation of Geodesic distance is that it does not satisfy the triangle inequality, making direct computation infeasible. To overcome this issue, approximation techniques such as the ISOMAP algorithm [44] are employed.
The ISOMAP algorithm combines the computational efficiency of Multidimensional Scaling (MDS) and Principal Component Analysis (PCA) with global optimization and asymptotic convergence to estimate Geodesic distances. It constructs a neighborhood graph by connecting nearby points in high-dimensional space, effectively capturing the local structure of the nonlinear space. The estimated Geodesic distances between feature variables in a tabular dataset X are computed as follows:
- Compute the Euclidean distance dX(i,j) between every pair of feature variables Xi and Xj, where
.
- Construct a graph G using the feature variables as nodes. Connect neighboring features based on their distances. If Xi and Xj are neighbors, set
Otherwise, assignto indicate no direct connection between the two features.
- Update dG(i,j) iteratively using the equation:
The resulting matrixrepresents the minimum distances and provides an estimate of the Geodesic distance among feature variables.
In this study, the neighborhood graph was constructed using a k-nearest neighbor method. To mitigate topological instability, often referred to as the “pinch problem” where the graph becomes disconnected, we adopted a single-linkage strategy where the distance between disconnected components was approximated by the minimum Euclidean distance between their closest points. By leveraging the ISOMAP algorithm, Geodesic distances can be effectively estimated, allowing for a more accurate representation of relationships within nonlinear feature spaces. This method enhances the ability of CNNs to capture complex feature structures when transforming tabular data into image representations.
Jensen-Shannon distance (JD)
The Jensen-Shannon (JS) distance [45] is a statistical measure used to quantify the similarity between two probability distributions. It is based on the Jensen-Shannon divergence, which, unlike the Kullback-Leibler (KL) divergence, is both symmetric and satisfies the triangle inequality, qualifying it as a true metric. The JS distance between two distributions X and Y is defined as:
where and KL denotes the Kullback-Leibler divergence.
This distance ranges from 0 to 1, where 0 indicates identical distributions and 1 indicates completely different distributions. JS distance is sensitive to both the shape and magnitude of the input distributions. Unlike Euclidean distance, which assumes feature homogeneity and fails to account for probabilistic structure, JS distance excels at comparing normalized distributions, such as frequency profiles or categorical probabilities. For high-dimensional feature spaces where variables may not follow a common distribution, JS distance offers a robust alternative to traditional metrics. It effectively captures the local structure and relationships among features by comparing the underlying probability distributions, making it especially useful when the data exhibits non-uniform or categorical characteristics.
In practice, JS distance has been widely used in fields like natural language processing, particularly for topic modeling and document classification [46], and in single-cell RNA sequencing, where it aids in comparing cell-type-specific gene expression distributions [47]. Its robustness, boundedness, and interpretability make it a valuable tool for analyzing structured, probability-based data.
To apply the Jensen-Shannon distance to continuous tabular data, such as gene expression profiles, we treat each feature vector as a discrete Probability Mass Function (PMF) defined over the sample space. Specifically, let represent the vector of values for the j-th feature across n samples. We normalize this vector to sum to unity:
where Pij represents the probability mass associated with sample i for feature j. If the data had contained negative values, such as z-scores, a Softmax normalization would be applied instead. Unlike a Kernel Density Estimation (KDE) which estimates the distribution of values, this PMF approach preserves the sample-wise structure, allowing the JS distance to quantify the dissimilarity between feature profiles.
Wasserstein distance (WD)
The Wasserstein distance [48], also referred to as the Earth Mover’s Distance (EMD) or Mallows distance, quantifies the minimal cost required to transform one probability distribution into another by optimally transporting probability mass. Unlike traditional pointwise metrics, it considers both the amount of mass moved and the distance it is moved, making it sensitive to the geometry of distributions. To compute the Wasserstein distance between two continuous distributions FX and FY on a two-dimensional plane, the following steps are taken:
- For quantile functions
and
, the Wasserstein distance is computed as:
- Closed-form formulae are only available when X and Y follow a Gaussian distribution. If
and
, then the squared Wasserstein distance is represented as:
where
is the Pearson correlation coefficient.
This distance is particularly advantageous for comparing complex, non-uniform distributions in high-dimensional spaces. It captures both shape and location differences between distributions, offering a more nuanced measure than divergence-based metrics like KL or JS divergence. Furthermore, Wasserstein distance is inherently robust to sample variability and outliers, making it suitable for analyzing empirical distributions such as histograms or quantiles.
Due to these properties, Wasserstein distance has become a foundational tool in various machine learning applications. It is central to generative modeling, most notably in Wasserstein GANs [49], where it addresses instability issues in adversarial training. In spatial transcriptomics, it facilitates the comparison of gene expression distributions across spatial regions [50]. Additionally, it has proven effective in tasks like histogram-based image classification, where it accurately reflects the underlying structure of the data.
Tropical distance (TD)
Recent research has highlighted the growing importance of non-Euclidean distance measures in machine learning and data analysis. Among these, the tropical distance, also known as the generalized Hilbert projective metric, is capable of capturing more complex data geometries and hierarchical relationships. The tropical distance originates from the field of tropical geometry [51], in which standard arithmetic operations are replaced by the “min-plus” or “max-plus” algebra. This framework provides a piecewise-linear and combinatorial alternative to classical Euclidean geometry, enabling the modeling of high-dimensional and structured data.
In this study, we define the tropical distance between two feature variables, Xi and Xj, as
The resulting matrix represents the tropical distance among feature variables. The tropical distance measures the range of coordinate-wise differences and reflects the relative structural variations between vectors rather than their absolute magnitudes. Owing to its scale-invariant and polyhedral nature, tropical geometry provides a robust foundation for representing complex relationships among multivariate data.
This approach has demonstrated significant potential in advancing neural network capabilities. For instance, Yoshida, Aliatimis, and Miura (2024) [52] proposed tropical neural networks, which have been successfully applied to the classification of complex and structured data such as phylogenetic trees, illustrating their effectiveness in capturing hierarchical patterns. Furthermore, Pasque et al. (2024) [53] introduced tropical decision boundaries that have been shown to provide notable robustness against adversarial attacks, revealing an inherent stability often lacking in Euclidean-based models. These advances collectively highlight the promise of tropical and other non-Euclidean metrics in enhancing data representation, model robustness, and classification performance within modern computational frameworks. Motivated by these findings, we integrate the tropical distance into our comparative analysis of non-Euclidean metrics within the IGTD framework for converting tabular data into images, aiming to evaluate its potential for improving both the performance and resilience of subsequent CNN-based classification models.
Simulation studies
To evaluate the ability of various distance metrics to capture complex relationships among features in high-dimensional settings, we conducted a series of simulation studies. Specifically, we generated six independent datasets, each intentionally constructed to align with one of the following distance metrics: Euclidean distance, one minus correlation, Geodesic distance, Jensen-Shannon distance, and Wasserstein distance. Each dataset comprises n = 200 observations and p = 2500 predictors. Let the data be denoted as , where
is a binary response variable for the i-th observation, and
represents the value of the j-th predictor for the i-th observation.
In every dataset, the first 100 predictors are structured according to the geometric or distributional properties best captured by the corresponding distance metric. In all scenarios, the remaining predictors
are added as standard normal noise to replicate the challenges of high-dimensional settings commonly found in real-world applications. The simulation design ensures that the primary signal is confined to the first 100 features and is distinctly structured to favor a particular distance metric. This simulation framework allows for a systematic evaluation of the ability of each distance metric to capture meaningful relationships in data with diverse geometrical and distributional characteristics. The results offer insights into the applicability and limitations of distance-based modeling approaches in high-dimensional structured data environments. The specific construction of each dataset is described in detail as follows.
Euclidean distance-based model
In this setting, the first 100 predictors are simulated from a multivariate normal distribution with independent and identically distributed components, capturing a linear structure well-suited to Euclidean geometry. The binary response variable is generated using a logistic regression model, where the linear predictor is formed from these signal features. Euclidean distance is particularly effective when data resides in a flat, linear space with uncorrelated and equally scaled features. Under this model, the predictors follow a multivariate normal distribution with identity covariance, naturally inducing a Euclidean geometry.
Let the signal predictors be denoted by
, where each component is independently drawn from a standard normal distribution. The model is specified as follows:
The binary response Yi is generated from a Bernoulli distribution with probability P(Yi = 1). This formulation assumes that the similarity between observations is best captured by their Euclidean proximity in the high-dimensional predictor space. We purposely maintained a high noise-to-signal ratio (2400 noise features versus 100 signal features) in this setting. This parameter choice was intended to serve as a stress test for the curse of dimensionality, evaluating whether the distance metrics could recover a linear signal buried in substantial noise without prior feature selection.
One minus correlation-based model
To construct a dataset where the similarity among observations is best captured by one minus correlation distance, the first 100 predictors are generated from a multivariate normal distribution with a strong Toeplitz correlation structure: , where
. This structure ensures high linear dependencies among features, which enhances the sensitivity of correlation-based metrics. Unlike random coefficients, we design the coefficient vector
to follow a smooth monotonic trend, increasing gradually across dimensions, which helps align the discriminative signal with the correlation structure.
Let represent the signal features drawn from:
We define a structured coefficient vector:
which distributes signal smoothly across features in a correlated fashion.
The linear predictor is scaled to amplify the signal:
Finally, the binary response is generated from a Bernoulli distribution using the logistic probabilities. The remaining 2400 predictors are added as standard Gaussian noise, resulting in a high-dimensional dataset with 2500 total features. This formulation ensures that class-discriminative information is embedded in the linear correlation patterns, favoring the one minus correlation distance (IGTD(RD)) as the most appropriate similarity measure.
Geodesic distance-based model
The signal features are sampled from a nonlinear manifold, such as a Swiss roll, embedded in high-dimensional space. The binary response Yi is determined based on the Geodesic distance along the manifold, reflecting intrinsic, non-Euclidean feature relationships. Let lie on a low-dimensional nonlinear manifold
, e.g., a Swiss roll:
Let be a nonlinear function defined over the Geodesic distance from a reference point, the binary response Yi is obtained by
where is the median of
.
Jensen-Shannon distance-based model
In this model, the first 100 predictors represent compositional or probability-valued features, drawn from class-specific Dirichlet distributions. This setup is designed to favor the Jensen-Shannon (JS) distance, which effectively captures differences between discrete probability distributions. For each observation i, the binary response variable determines the distribution from which the signal features are generated. Let
denote a probability vector lying on the 99-simplex. Then, the class-conditional generation is given by:
Here, and
are chosen to generate distinct yet plausible class-specific distributions:
produces a uniform distribution, while
concentrates the probability mass more evenly, resulting in lower entropy. This construction induces differences in the distributional structure of the features, such that the JS distance between observations from different classes is maximized in expectation. The remaining 2400 predictors are standard Gaussian noise, completing the high-dimensional structure. This simulation design ensures that class separation is primarily encoded in distributional divergence, favoring the Jensen-Shannon distance as an appropriate similarity measure.
Wasserstein distance-based model
To construct a setting where the Wasserstein distance effectively captures class differences, the signal features are simulated as empirical quantile vectors derived from class-specific distributions. Specifically, for each observation i, the signal vector is generated by sorting a sample of 100 values drawn from an underlying distribution depending on the class label
. The class-conditional distributions are defined as follows:
Let be a fixed template distribution, defined as the average quantile vector computed from all class 0 observations. The Wasserstein distance is computed as:
The binary class label is then determined by:
where is the empirical median of the Wasserstein distances across all observations. This construction ensures that class separation is encoded via both location and shape differences in the empirical distributions, making Wasserstein distance the most suitable similarity measure for classification in this setting.
Although the Wasserstein distance is theoretically capable of capturing differences in both distribution shape (e.g., variance, skewness) and location, this simulation focuses exclusively on a location shift between class-conditional distributions. This design choice was made to isolate the location effect and provide a controlled evaluation of the metric’s sensitivity to mean differences, ensuring that the source of the discriminative signal is unambiguous.
Tropical distance-based model
To construct a simulation setting in which the tropical distance is uniquely aligned with the intrinsic discriminative structure, we design a feature space whose geometry is governed by coordinate-wise dominance patterns rather than Euclidean magnitude, linear correlation, or smooth manifold structure. The tropical distance between two vectors is defined as
which measures the range of coordinate-wise differences and is invariant to additive shifts. This metric is therefore particularly sensitive to max-min dominance relations.
In this model, the first 100 predictors encode the tropical signal, while the remaining p–100 predictors act as independent noise. Let denote the signal features for the ith observation and
the corresponding binary class label, with
independently for
. The index set
is randomly partitioned into S disjoint segments
of approximately equal size, each representing a plateau of nearly constant feature values. For each observation i and segment
, we first draw an independent baseline level
To introduce global class-dependent structure, two distinct segments s + and s− are selected once per dataset, uniformly from with
. These segments are shifted upward and downward, respectively, for all observations in class 1, while class 0 observations retain only the baseline levels:
where controls the strength of the tropical signal. To avoid degeneracy and introduce mild within-segment variability, we add a small Gaussian perturbation and set
Unless otherwise stated, we set S = 5, choose , and use
.
After simulating the 100 signal features, we optionally apply column-wise centering across samples,
which removes mean differences between feature vectors. This operation alters Euclidean and correlation-based distances between features but leaves the tropical distance dT unchanged, because subtracting a constant from both vectors does not affect the range of their coordinate-wise differences. Thus, column centering suppresses some of the geometric cues preferred by Euclidean and correlation metrics while preserving the dominance structure emphasized by the tropical distance. The remaining p–100 predictors are generated as independent Gaussian noise:
and the full predictor vector is defined as
From a geometric perspective, this construction induces a clear block structure in the tropical distance matrix between features. For any two feature vectors belonging to the same segment Is, the coordinate-wise differences across samples remain small and nearly constant, yielding a small tropical distance dT. In contrast, pairs of features from different segments, and especially those between the globally shifted segments and
, exhibit large and systematic coordinate-wise dominance gaps, resulting in substantially larger tropical distances. Consequently, when IGTD is built using the tropical distance, the resulting feature distance rank matrix closely approximates a block-structured ideal, and the optimized pixel arrangement places segment features into coherent spatial clusters on the image grid.
By comparison, Euclidean and one minus correlation distances primarily depend on differences in global means, variances, or linear dependencies, which are moderated by the symmetric baseline and further reduced by column-wise centering. Geodesic, Jensen-Shannon, and Wasserstein distances rely on smooth manifold geometry or fine-grained distributional form, both of which are weakened by the piecewise-constant plateau construction and the addition of isotropic Gaussian noise. As a result, these non-tropical metrics fail to reveal the strong segment-wise geometry that determines the class boundary, whereas the tropical distance, being explicitly sensitive to coordinate-wise dominance and invariant to translation, remains uniquely aligned with the true discriminative structure. Within the IGTD framework, this alignment enables IGTD(TD) to generate pseudo-images with distinct, localized intensity patterns that are strongly associated with the class label, allowing CNN-based classifiers to achieve substantially higher accuracy than when IGTD is built on Euclidean, correlation-based, Geodesic, Jensen-Shannon, or Wasserstein distances.
Results
The generated datasets were processed using the IGTD algorithm to produce image representations based on six different distance metrics. Note that the computation of pixel distances in the IGTD algorithm is always based on Euclidean distances. These images were then input into a standardized, pre-trained convolutional neural network (CNN). The CNN architecture comprised two convolutional layers with 3 3 kernels, followed by max-pooling layers and batch normalization. We used a stride of 1 and same padding (zero padding to preserve spatial dimensions) for the convolutional layers. Binary cross-entropy was used as the loss function. Model training and evaluation were performed using ten-fold cross-validation repeated across ten iterations. A learning rate reduction strategy and early stopping mechanism were employed to optimize training efficiency and prevent overfitting. The full architecture of the CNN is illustrated in Fig 1.
The network comprises two sequential blocks, each containing a convolutional layer with 3 3 kernels (stride 1, same padding), batch normalization, and a max-pooling layer. The final layers are tailored for binary classification, utilizing binary cross-entropy loss. Training was optimized with a learning rate reduction strategy and early stopping.
To assess classification performance, we used standard validations indices (VIs) including accuracy, precision, recall, and F1-score. The performance of the IGTD algorithm using non-Euclidean distance metrics was compared against a baseline model that employed the Euclidean distance. Specifically, we denote IGTD(ED) for the Euclidean distance, IGTD(RD) for the one minus correlation distance, IGTD(GD) for the Geodesic distance, IGTD(JS) for the Jensen-Shannon distance, IGTD(WD) for the Wasserstein distance, and IGTD(TD) for the Tropical distance. Ten-fold cross-validation classification results (mean and standard deviation) across six simulated model datasets, using six different distance metrics within the IGTD framework are shown in Table 1. The classification results across the six simulated models demonstrate that each distance metric within the IGTD framework performs best under specific data structures that align with its mathematical properties.
In the Euclidean distance-based model, IGTD(ED) yielded the highest mean accuracy (0.52) relative to the non-Euclidean alternatives, although the absolute performance was low across all methods. This is attributed to two factors: the high-dimensional noise (2400 noise features vs. 100 signal features) and the use of a standardized CNN architecture. To ensure a fair comparison across all simulation scenarios, we did not fine-tune the CNN hyperparameters specifically for the Euclidean model. Consequently, these results serve as a baseline validation: they demonstrate that under strict “ceteris paribus” conditions, utilizing more complex non-Euclidean metrics offers no predictive benefit when the underlying data structure is strictly linear and i.i.d. Thus, while the challenging experimental conditions dampened the absolute classification power, the relative performance confirms that Euclidean distance remains the most appropriate and parsimonious metric when the data lacks intrinsic non-linear manifolds.
For the correlation-based model, IGTD(RD) emerges as the strongest performer with an accuracy of 0.78, precision of 0.79, and F1-score of 0.78. This reflects the model’s deliberate design: the use of Toeplitz covariance and a gradually increasing coefficient vector emphasizes linear dependency across features, which is effectively captured by the one minus correlation distance. Other metrics, particularly IGTD(JD) and IGTD(WD), show lower and more variable performance, indicating they are not well-suited for such correlated structures.
In the Geodesic distance-based model, IGTD(GD) significantly outperforms all other metrics, with an accuracy of 0.87 and F1-score of 0.88. This aligns well with the underlying data, which are sampled from a nonlinear manifold. The strength of IGTD(GD) lies in its ability to preserve local geometry, making it particularly effective for data residing on curved or non-Euclidean spaces. Metrics like IGTD(ED) and IGTD(RD) are less capable of capturing this structure and thus perform moderately.
Under the Jensen-Shannon model, IGTD(JD) shows excellent performance with an accuracy of 0.99 and F1-score of 0.99, though IGTD(ED) and IGTD(WD) still achieve perfect accuracy in this setting. While this suggests that several metrics can handle compositional features well, IGTD(JD) remains the most interpretable and theoretically appropriate choice for data represented by probability vectors. The fact that all metrics perform strongly here might be due to the pronounced class differences introduced via the Dirichlet parameters.
In the Wasserstein distance-based model, IGTD(WD) performs best overall, achieving the highest accuracy (0.99) and F1-score (0.99), with perfect recall. This is consistent with the model’s design, which encodes class differences through quantile-based distributional shifts. The Wasserstein distance’s sensitivity to both location and shape differences in distributions makes it particularly powerful in this context. Other metrics also perform well, but not as consistently or robustly.
Finally, in the Tropical distance-based model, IGTD(TD) demonstrated superior performance metrics, achieving the highest precision (0.9457) and F1-score (0.9041), alongside a high accuracy (0.9150) comparable to the Euclidean baseline. This suggests that while Euclidean distance can capture some signal in max-plus algebraic structures due to scaling effects, the Tropical distance provides a more precise characterization of the coordinate-wise dominance relationships inherent in the data generation process. The distinct advantage in precision highlights the specific alignment of IGTD(TD) with the tropical geometry of the features.
In summary, these results affirm that each non-Euclidean distance metric excels under the conditions it was designed to model. While the performance gaps between the proposed non-Euclidean metrics and the Euclidean baseline are not always large in these controlled simulations, the results validate a crucial principle of structural alignment. In every scenario, the distance metric that mathematically corresponds to the underlying generative process (e.g., Geodesic distance for manifold data, Wasserstein for distributional shifts, Tropical distance for max-plus structures) consistently achieved the top-ranking performance. This confirms that the IGTD framework functions as intended: it effectively translates specific geometric and distributional relationships into image representations. The simulation study thus serves as a proof of concept, demonstrating that choosing the correct metric provides a consistent, albeit sometimes marginal, advantage in controlled settings, a benefit that becomes more pronounced in complex, real-world data where multiple structural characteristics may coexist.
Real data examples
Datasets
Building on the successful validation through statistical simulations, our study progresses to applying the proposed method to real-world datasets. We utilize the same CNN models employed in the simulation study to maintain consistency in evaluation. Performance assessment is conducted using key evaluation metrics, including accuracy, precision, recall, and F1-score, ensuring a comprehensive comparison of model effectiveness. Below, we provide a summarized description of the genetic datasets used in this study (Table 2).
ARCENE dataset. [54] ARCENE was constructed by merging three mass spectrometry datasets: NCI ovarian cancer data, NCI prostate cancer data, and EVMS prostate cancer data. The dataset includes 100 training samples and 100 validation samples, each containing 10,000 features, of which 7000 are predictive and 3000 are spurious. Since the testing set does not provide binary labels, we combined the training and validation sets, resulting in 112 control samples labeled as class 0 and 88 cancer samples labeled as class 1.
Colon cancer dataset. [55] This dataset consists of the 2000 genes with the highest minimal intensity across samples and contains 62 samples, including 22 normal and 40 tumor tissues. The binary target variable y is set to class 0 for normal tissues and class 1 for tumor tissues.
Leukemia dataset. [56] This dataset differentiates between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). It comprises 72 samples, with 25 AML cases and 47 ALL cases, and includes 3571 gene expression features. The binary target variable y is defined as class 0 for AML and class 1 for ALL.
Ovarian cancer dataset. [57] This dataset consists of 253 samples, including 91 normal ovarian tissue samples and 162 ovarian cancer samples, with 15,154 molecular mass/charge (M/Z) features. The binary target variable y is labeled as class 0 for normal samples and class 1 for cancer samples.
Prostate cancer dataset. [58] This dataset contains prostate tissue samples from 102 patients, comprising 52 tumor samples and 50 non-tumor samples, with 6033 gene expression features. The binary target variable y is defined as class 0 for non-tumor samples and class 1 for tumor samples.
Results of the feature distance rank matrices
To evaluate the effectiveness of different distance metrics in capturing spatial and structural relationships among features, we compare feature distance rank matrices derived from five real-world tabular datasets against a theoretical “true model.” This reference model represents the rank matrix of Euclidean distances between all pixel pairs in a p p image grid, arranged row-wise. Its rank values, visualized through grayscale intensities, exhibit a smooth, symmetric structure, with lighter tones near the diagonal and darker shades radiating outward, reflecting spatial continuity and locality in a well-ordered 2D layout. Fig 2 presents the rank matrices obtained using six distance metrics, Euclidean, one minus correlation, Geodesic, Jensen-Shannon, Wasserstein, and Tropical after feature reordering and optimization to approximate this ideal structure.
Feature distance rank matrices based on various distance metrics (from left to right: Euclidean distance, one minus correlation, Geodesic distance, Jensen-Shannon distance, Wasserstein distance, and Tropical distance) computed between all pairs of variables across five datasets, following optimization and feature reordering. The grey level indicates the rank value.
Across datasets, Euclidean-based matrices show moderate spatial coherence. For instance, in Prostate, they retain some diagonal structure and smooth transitions. However, in smaller or noisier datasets like Colon and Leukemia, Euclidean matrices appear disordered, failing to reveal consistent feature relationships. One minus correlation distance yields mixed outcomes; while it reveals some block structures in datasets like Ovarian and Prostate, it generally captures linear dependencies rather than continuous spatial gradients, deviating from the true model. In contrast, Geodesic distance produces smoother transitions and clearer diagonal formations in Ovarian and Prostate, indicating its ability to uncover intrinsic geometric structures. Jensen-Shannon distance demonstrates variability: it shows some spatial coherence in Prostate but generates fragmented and noisy patterns in Colon, Leukemia, and Ovarian, likely due to unstable distribution estimates in sparse data. Wasserstein distance shows strong performance in datasets like Colon, Leukemia, and Prostate, maintaining smooth, block-like rank structures. However, in ARCENE, the matrices become less coherent, likely due to sample noise and the difficulty of estimating reliable empirical distributions. The Tropical distance generates rank matrices with distinct structural characteristics; for the Prostate dataset, it produces a highly coherent diagonal pattern similar to the Euclidean and Geodesic metrics, suggesting it successfully captures the hierarchical dominance relationships in this data. However, for the Colon dataset, the Tropical distance matrix appears more fragmented, indicating that the max-plus metric structure may be less suitable for this specific gene expression profile.
Overall, Wasserstein distance most closely approximates the spatial layout of the true model across well-behaved datasets, offering more faithful and interpretable representations of feature relationships. While Euclidean and correlation-based metrics are more limited in capturing nonlinear structure. The performance between Geodesic distance and Jensen-Shannon distance are similar. Tropical distance shows promise in datasets with specific hierarchical structures but exhibits variability in others. These findings emphasize the value of integrating non-Euclidean distances, particularly Geodesic, Wasserstein, and Tropical into the IGTD framework, enhancing its ability to transform tabular data into meaningful spatial representations for CNN-based learning, while also revealing its boundaries in challenging scenarios.
Pseudo-image visualization
Fig 3 examines the quality and structure of pseudo-images generated from tabular data for binary classification tasks using IGTD framework. For each of the five datasets, representative samples from class 0 and class 1 are visualized as grayscale images. These images were generated using six different distance metrics, and arranged from left to right as follows: Euclidean distance, one minus correlation distance, Geodesic distance, Jensen-Shannon distance, Wasserstein distance, and Tropical distance.
The first row displays pseudo-images from class 0, while the second row shows pseudo-images from class 1, for representative samples from each of the five datasets. Images are generated using six distance metrics, Euclidean distance, one minus correlation, Geodesic distance, Jensen-Shannon distance, Wasserstein distance, and Tropical distance, arranged from left to right.
In ARCENE, pseudo-images generated using one minus correlation, Geodesic, Jensen-Shannon, Wasserstein distances, and Tropical distance revealed special patterns with clear class separability, in contrast to the diffuse, less structured patterns under Euclidean. This highlights the superiority of non-Euclidean metrics in capturing underlying feature geometry. In the Colon dataset, where the sample size is small and dimensionality high, all images appeared noisy, but Geodesic and Jensen-Shannon distances still produced marginally class separability, suggesting a slight advantage even under data constraints. The Leukemia dataset further emphasized the strength of Geodesic, Wasserstein, and Tropical metrics, which yielded more centered and structured intensity distributions, revealing clearer distinctions between classes. Euclidean distance again failed to preserve meaningful structure. In the Ovarian dataset, which benefits from a larger sample size and variable numbers, all distance metrics produced the most visually distinct class patterns. Finally, for Prostate, class differences were more distinct for Euclidean, one minus correlation, Geodesic and Jensen-Shannon distance metrics.
These visual comparisons demonstrate that non-Euclidean metrics such as Geodesic, Wasserstein, and Tropical distances consistently outperformed others in generating coherent and class-discriminative pseudo-images, regardless of sample size or feature complexity. These visual findings align with quantitative classification results, reinforcing the utility of non-Euclidean metrics in enhancing CNN-based learning from high-dimensional tabular data.
Classification performance evaluation
Ten-fold cross-validation classification results (mean validation indices with one standard deviation) across five real-world datasets using six different distance metrics are given in Table 3. Overall, the results demonstrate that non-Euclidean metrics often outperform the baseline Euclidean distance in terms of accuracy and predictive performance. In the ARCENE dataset, IGTD(GD) achieved the highest accuracy (0.8650), outperforming IGTD(ED) (0.8150), with IGTD(WD) also showing strong performance (0.8500). The Colon dataset exhibited a notable increase in accuracy with IGTD(GD) (0.8357), compared to only 0.7452 for IGTD(ED). For Leukemia, IGTD(RD) and IGTD(JD) yielded higher accuracy scores (0.9464 and 0.9429, respectively) than the Euclidean-based method. In the Ovarian dataset, all methods performed exceptionally well, but IGTD(GD) and IGTD(RD) slightly surpassed IGTD(ED), with accuracy values approaching 0.99. For the Prostate dataset, IGTD(TD) utilizing Tropical distance achieved the highest accuracy (0.8718) and precision (0.9133) among all metrics, surpassing IGTD(RD) (0.8609) and IGTD(ED) (0.8227). This significant result underscores the effectiveness of tropical geometry in capturing the specific feature relationships inherent in prostate cancer genomic data. Conversely, IGTD(TD) showed variable performance across other datasets, such as Colon (0.5833), suggesting its utility may be domain-specific.
In terms of stability, measured through standard deviations, IGTD(GD) consistently maintained a good balance between high accuracy and moderate variability, indicating robustness in modeling nonlinear structures. While IGTD(RD) and IGTD(JD) sometimes exhibited higher variance, particularly in smaller or more imbalanced datasets like Colon and Leukemia, IGTD(WD) generally combined strong performance with lower variability. These observations suggest that Wasserstein and Geodesic distances are both effective and reliable for capturing intrinsic relationships in high-dimensional feature spaces.
The influence of dataset characteristics, such as sample size and number of features, is also apparent in the results. Smaller datasets with high dimensionality, such as Colon (n = 62, p = 2000) and Leukemia (n = 72, p = 3571), showed the most pronounced performance gains when non-Euclidean metrics were used, particularly IGTD(GD). These improvements likely stem from the ability of non-Euclidean metrics to capture complex, nonlinear dependencies that traditional Euclidean distance may fail to detect.
These empirical findings underscore the critical importance of spatial coherence in the transformed data. The relevance of aligning with the theoretical true model extends beyond visual interpretability; it is fundamental to the mechanics of CNN-based classification. CNNs rely on the principle of locality, utilizing convolutional kernels to extract features from spatially adjacent pixels within a receptive field. A feature distance rank matrix that approximates the smooth, continuous structure of the true model indicates that the chosen distance metric has successfully mapped intrinsically similar features to neighboring coordinates on the image grid. This spatial coherence ensures that the CNN’s filters can effectively capture local correlations and structural patterns. Conversely, a fragmented or disordered rank matrix implies that related features remain scattered across the grid, negating the advantages of the convolutional architecture and hindering the model’s ability to extract robust representations.
Conclusion and discussion
This study presents significant advancements to the Image Generator for Tabular Data (IGTD) framework by introducing non-Euclidean distance metrics, specifically one minus correlation distance, Geodesic distance, Jensen-Shannon distance, Wasserstein distance, and Tropical distance, to enhance the classification of high-dimensional tabular data using Convolutional Neural Networks (CNNs). Through the simulation studies and the extensive evaluation across five real-world benchmark datasets, we demonstrate that replacing the conventional Euclidean distance with more expressive non-Euclidean alternatives substantially improves classification accuracy, image quality, and the ability to capture complex feature structures.
Our findings, supported by classification performance, distance rank matrix analysis, and pseudo-image visualization, show that Geodesic and Wasserstein distances consistently outperform other metrics. Furthermore, Tropical distance demonstrated superior performance on the Prostate dataset, offering a unique advantage for specific genomic data structures. These methods produce spatially coherent image representations that preserve intrinsic geometric or distributional relationships among features, making them particularly well-suited for CNN-based learning. The improved fidelity of these image representations allows CNNs to extract more meaningful patterns, thereby enhancing overall model performance.
Beyond predictive gains, the refined IGTD framework offers methodological contributions by serving as a bridge between traditional statistical modeling and deep learning. It enables high-dimensional tabular data to be encoded into structured visual formats, which not only benefit CNN training but may also act as a form of informed data augmentation, capturing nonlinearity and spatial dependencies often missed by Euclidean-based transformations.
However, this study also highlights important limitations. Notably, the computational cost of metrics like Jensen-Shannon and Wasserstein distances can be substantial, particularly for large-scale or real-time applications, due to the complexity of estimating probability distributions and solving optimal transport problems. Although these metrics are computationally more demanding than the Euclidean distance, their added cost constitutes only a small portion of the total processing time, which is largely dominated by the subsequent CNN training and classification stages. Additionally, in highly sparse or noisy datasets, none of the distance metrics, including the best-performing ones, produced fully coherent representations, suggesting potential limitations of the IGTD framework in extreme data settings. The challenge of processing noisy or sparse tabular data for image conversion remains a fundamental limitation in most existing tabular-to-image deep learning methods. Currently, little research has explored how to effectively transform such imperfect tabular data into meaningful image representations without compromising critical structural patterns. Addressing this challenge presents a key opportunity for future work, as robust conversion techniques for noisy and sparse datasets could greatly enhance the real-world applicability of image-based tabular data classification.
To contextualize our findings, we compared our results against available state-of-the-art (SOTA) benchmarks identified through a survey of recent literature. For the Ovarian cancer dataset, our best-performing model achieved an accuracy of 99.2%, which is highly comparable to the 98.6% accuracy reported by [59] using a hybrid ReliefF-CNN model. However, for other datasets, highly specialized methods employing advanced feature selection often achieve higher absolute performance. For instance, while our method achieved approximately 94.6% on the Leukemia dataset, recent work by [60] reported 100% classification accuracy using optimized Support Vector Machine (SVM) and Logistic Regression models. Similarly, Asad and Mollah (2021) [61] reported 100% accuracy on the Colon dataset using Symmetrical Uncertainty-based feature selection with Random Forest and Multilayer Perceptron (MLP) classifiers, surpassing the 83.6% achieved by our IGTD(GD) model without feature selection. Comparable 100% accuracy benchmarks have also been reported for the Prostate dataset by [62], and accuracies reaching 99% have been achieved for ARCENE using the Enhanced Incremental Deep Multi-Layer Perceptron (EIDMLP) classifier [63]. It should be noted that while multiple studies in the literature may report similar high-performance metrics (e.g., 100% accuracy), we have prioritized citing the most recent and representative examples.
It is crucial to note that these external results serve primarily as reference points rather than direct benchmarks. Variations in data preprocessing (e.g., normalization techniques, outlier removal), feature selection methods (which alter the input dimensionality), and validation protocols (e.g., different cross-validation folds or train/test splits) mean that the same dataset name often refers to slightly different experimental inputs in the literature. Furthermore, this performance disparity is a natural consequence of our experimental scope; our primary objective was to isolate and evaluate the specific impact of distance metrics on tabular-to-image conversion, rather than to engineer a maximally optimal classification pipeline. To ensure a rigorous internal comparison, we deliberately employed a standardized, fixed CNN architecture across all experiments without extensive hyperparameter tuning or advanced feature selection techniques, factors that are often critical drivers of SOTA performance. Consequently, while our absolute accuracies are lower than highly specialized models, the relative performance gains observed with non-Euclidean metrics confirm their utility.
Future research may focus on developing more scalable implementations of complex distance metrics, such as approximations of optimal transport or kernel-based geodesic computations, to reduce computational overhead. Additionally, exploring adaptive metric selection strategies, where the most suitable distance function is learned from the data rather than predetermined, could offer valuable improvements. Furthermore, integrating ED, RD, GD, JD, WD, and TD as stages of data augmentation within the CNN pipeline may lead to enhanced performance and greater interpretability.
In conclusion, the integration of non-Euclidean metrics within the IGTD framework significantly enhances its ability to transform tabular data into meaningful image representations for deep learning. These findings underscore the importance of selecting appropriate distance functions for tabular-to-image conversion and affirm the potential of non-Euclidean IGTD as a robust, interpretable, and versatile tool for CNN-based classification of complex tabular datasets.
Acknowledgments
The authors are grateful to the editor, associate editor, and referees for their insightful comments and suggestions.
References
- 1. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
- 2. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1):53. pmid:33816053
- 3.
Somvanshi S, Das S, Javed SA, Antariksa G, Hossain A. A survey on deep tabular learning. arXiv preprint 2024.
- 4. Zhu Y, Brettin T, Xia F, Partin A, Shukla M, Yoo H, et al. Converting tabular data into images for deep learning with convolutional neural networks. Sci Rep. 2021;11(1):11325. pmid:34059739
- 5. Borisov V, Leemann T, Sebler K, Haug J, Pawelczyk M, Kasneci G. Deep neural networks and tabular data: a survey. IEEE Trans Neural Netw Learn Syst. 2024;35(6):7499–519. pmid:37015381
- 6.
Gorishniy Y, Rubachev I, Khrulkov V, Babenko A. Revisiting deep learning models for tabular data. arXiv preprint 2021.
- 7. Hwang Y, Song J. Recent deep learning methods for tabular data. Communications for Statistical Applications and Methods. 2023;30(2):215–26.
- 8.
Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need. arXiv preprint 2021.
- 9.
Lyu B, Haque A. Deep learning based tumor type classification using gene expression data. In: Proceedings of the ACM International Conference on Bioinformatics Comp Biol Health Informatics. 2018. p. 89–96.
- 10.
Ma S, Zhang Z. OmicsMapNet: transforming omics data to take advantage of deep convolutional neural network for discovery. arXiv preprint 2018;.
- 11. Shneiderman B. Tree visualization with tree-maps. ACM Trans Graph. 1992;11(1):92–9.
- 12. Sharma A, Vans E, Shigemizu D, Boroevich KA, Tsunoda T. DeepInsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci Rep. 2019;9(1):11399. pmid:31388036
- 13. Bazgir O, Zhang R, Dhruba SR, Rahman R, Ghosh S, Pal R. Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks. Nat Commun. 2020;11(1):4391. pmid:32873806
- 14. Oh JH, Choi W, Ko E, Kang M, Tannenbaum A, Deasy JO. PathCNN: interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma. Bioinformatics. 2021;37(Suppl_1):i443–50. pmid:34252964
- 15. Kalakoti Y, Yadav S, Sundar D. SurvCNN: a discrete time-to-event cancer survival estimation framework using image representations of omics data. Cancers (Basel). 2021;13(13):3106. pmid:34206288
- 16. Tang H, Yu X, Liu R, Zeng T. Vec2image: an explainable artificial intelligence model for the feature representation and classification of high-dimensional biological data by vector-to-image conversion. Brief Bioinform. 2022;23(2):bbab584. pmid:35106553
- 17. Wang S, Zhang H, Liu Z, Liu Y. A novel deep learning method to predict lung cancer long-term survival with biological knowledge incorporated gene expression images and clinical data. Front Genet. 2022;13:800853. pmid:35368657
- 18. Lee E, Nam M, Lee H. Tab2vox: CNN-based multivariate multilevel demand forecasting framework by tabular-to-voxel image conversion. Sustainability. 2022;14(18):11745.
- 19. Castillo-Cara M, Talla-Chumpitaz R, García-Castro R, Orozco-Barbosa L. TINTO: converting tidy data into image for classification with 2-dimensional convolutional neural networks. SoftwareX. 2023;22:101391.
- 20.
Lee SE, Lee SC. TablEye: seeing small tables through the lens of images. arXiv preprint 2023.
- 21. Randive KD, Ramasundaram M. MWCapsNet: a novel multi-level wavelet capsule network for insider threat detection using image representations. Neurocomputing. 2023;553:126588.
- 22. Briner N, Cullen D, Halladay J, Miller D, Primeau R, Avila A, et al. Tabular-to-image transformations for the classification of anonymous network traffic using deep residual networks. IEEE Access. 2023;11:113100–13.
- 23. Matsuda T, Uchida K, Saito S, Shirakawa S. HACNet: end-to-end learning of interpretable table-to-image converter and convolutional neural network. Knowledge-Based Systems. 2024;284:111293.
- 24.
Wang Z, Oates T. Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In: Workshops at AAAI Conf Artif Intell; 2015.
- 25.
Hatami N, Gavet Y, Debayle J. Classification of time-series images using deep convolutional neural networks. In: Proc. ICMV. vol. 1069 6; 2018. p. 242–9.
- 26.
Yang CL, Yang CY, Chen ZX, Lo NW. Multivariate time series data transformation for convolutional neural network. In: Proc. SICE Int Symp Syst Integr. IEEE; 2019. p. 188–92.
- 27. Barra S, Carta SM, Corriga A, Podda AS, Recupero DR. Deep learning and time series-to-image encoding for financial forecasting. IEEE/CAA J Autom Sinica. 2020;7(3):683–92.
- 28. Bazgir O, Lu J. REFINED-CNN framework for survival prediction with high-dimensional features. iScience. 2023;26(9):107627. pmid:37664631
- 29. Kablaoui R, Ahmad I, Awad M. Network traffic prediction by learning time series as images. Eng Sci Technol Int J. 2024;55:101754.
- 30. Sharma A, López Y, Jia S, Lysenko A, Boroevich KA, Tsunoda T. Enhanced analysis of tabular data through multi-representation DeepInsight. Sci Rep. 2024;14(1):12851. pmid:38834670
- 31.
Yan R, Islam MT, Xing L. Interpretable discovery of patterns in tabular data via spatially semantic topographic maps. Nat Biomed Eng. 2024:1–12.
- 32.
Poonam K, Kotra VS, Guha R, Chakrabarti PP. Hierarchical classification of frontotemporal dementia subtypes utilizing tabular-to-image data conversion with deep learning methods. In: Proc. Int Conf Pattern Recognit; 2025. p. 386–401.
- 33.
Tang X, Prodduturi N, Thompson KJ, Weinshilboum RM, O’Sullivan CC, Boughey JC. OmicsFootPrint: a framework to integrate and interpret multi-omics data using circular images and deep neural networks. bioRxiv. 2024.
- 34. El-Melegy M, Mamdouh A, Ali S, Badawy M, El-Ghar MA, Alghamdi NS, et al. Prostate cancer diagnosis via visual representation of tabular data and deep transfer learning. Bioengineering (Basel). 2024;11(7):635. pmid:39061717
- 35.
Lee S, Oh S. Table2Image: interpretable tabular data classification with realistic image transformations. arXiv preprint 2024.
- 36. Dagan I, Vainshtein R, Katz G, Rokach L. Automated algorithm selection using meta-learning and pre-trained deep convolution neural networks. Information Fusion. 2024;105:102210.
- 37.
Gómez-Martínez V, Lara-Abelenda FJ, Peiro-Corbacho P, Chushig-Muzo D, Granja C, Soguero-Ruiz C. LM-IGTD: a 2D image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networks. arXiv preprint 2024.
- 38.
Kulkarni AD. Fuzzy convolution neural networks for tabular data classification. arXiv preprint 2024.
- 39.
Jiang JP, Liu SY, Cai HR, Zhou Q, Ye HJ. Representation learning for tabular data: a comprehensive survey. arXiv preprint 2025.
- 40.
Nazarri MNA, Yusof MHM, Almohammedi AA. Generating network intrusion image through IGTD algorithm for CNN classification. In: Proc. ICCIT. IEEE; 2023. p. 172–7.
- 41. Hosseini M, Chitsaz I. Knock probability determination employing convolutional neural network and IGTD algorithm. Energy. 2023;284:129282.
- 42. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95(25):14863–8. pmid:9843981
- 43. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005;4:Article17. pmid:16646834
- 44. Tenenbaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290(5500):2319–23. pmid:11125149
- 45. Österreicher F, Vajda I. A new class of metric divergences on probability spaces and its applicability in statistics. Ann Inst Stat Math. 2003;55(3):639–53.
- 46. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
- 47. Hie B, Bryson B, Berger B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol. 2019;37(6):685–91. pmid:31061482
- 48. Panaretos VM, Zemel Y. Statistical aspects of Wasserstein distances. Annu Rev Stat Appl. 2019;6:405–31.
- 49.
Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. In: Proceedings of the 34th International Conference on Machine Learning; 2017. p. 214–23.
- 50. Burkhardt DB, et al. SpatialDE2: fast and accurate identification of spatially variable genes. Nat Methods. 2023;20:103–11.
- 51.
Maclagan D, Sturmfels B. Introduction to tropical geometry. Providence, RI: American Mathematical Society; 2015.
- 52.
Yoshida R, Aliatimis G, Miura K. Tropical neural networks and its applications to classifying phylogenetic trees. In: 2024 International Joint Conference on Neural Networks (IJCNN). 2024. p. 1–9. https://doi.org/10.1109/ijcnn60899.2024.10650971
- 53.
Pasque K, Teska C, Yoshida R, Miura K, Huang J. Tropical decision boundaries for neural networks are robust against adversarial attacks. arXiv preprint 2024.
- 54.
Guyon I, Gunn S, Ben-Hur A, Dror G. Arcene [Dataset]. 2004.
- 55. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A. 1999;96(12):6745–50. pmid:10359783
- 56. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531–7. pmid:10521349
- 57. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet. 2002;359(9306):572–7. pmid:11867112
- 58. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002;1(2):203–9. pmid:12086878
- 59. Kilicarslan S, Adem K, Celik M. Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Med Hypotheses. 2020;137:109577. pmid:31991364
- 60. Acharjee R, Sikder AS, Gupi H paul, Hussain SS. Optimization of machine and deep learning algorithms in blood cancer classification.. IJIST. 2025;3(3).
- 61. Asad E, Mollah AF. Biomarker identification from gene expression based on symmetrical uncertainty. Int J Intell Inf Technol. 2021;17(4):1–19.
- 62.
Tirumala SS, Narayanan A. Attribute selection and classification of prostate cancer gene expression data using artificial neural networks. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2016. p. 26–34.
- 63.
Renuka DD, Swetha MTA. Enhancing big data classification accuracy through deep learning techniques. Artif Intell Appl. 2025;3.