Figures
Abstract
Unlike in the field of visual scene recognition, where tremendous advances have taken place due to the availability of very large datasets to train deep neural networks, inference from medical images is often hampered by the fact that only small amounts of data may be available. When working with very small dataset problems, of the order of a few hundred items of data, the power of deep learning may still be exploited by using a pre-trained model as a feature extractor and carrying out classic pattern recognition techniques in this feature space, the so-called few-shot learning problem. However, medical images are highly complex and variable, making it difficult for few-shot learning to fully capture and model these features. To address these issues, we focus on the intrinsic characteristics of the data. We find that, in regimes where the dimension of the feature space is comparable to or even larger than the number of images in the data, dimensionality reduction is a necessity and is often achieved by principal component analysis or singular value decomposition (PCA/SVD). In this paper, noting the inappropriateness of using SVD for this setting we explore two alternatives based on discriminant analysis (DA) and non-negative matrix factorization (NMF). Using 14 different datasets spanning 11 distinct disease types we demonstrate that at low dimensions, discriminant subspaces achieve significant improvements over SVD-based subspaces and the original feature space. We also show that at modest dimensions, NMF is a competitive alternative to SVD in this setting. The implementation of the proposed method is accessible via the following link.
Citation: Liu J, Fan K, Cai X, Niranjan M (2024) Few-shot learning for inference in medical imaging with subspace feature representations. PLoS ONE 19(11): e0309368. https://doi.org/10.1371/journal.pone.0309368
Editor: Longxiu Huang, Michigan State University, UNITED STATES OF AMERICA
Received: January 15, 2024; Accepted: August 11, 2024; Published: November 6, 2024
Copyright: © 2024 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript. All relevant data are available from the following link: https://medmnist.com/.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Impressive empirical performances have been reported in the field of computer vision in recent years, starting from a step improvement reported in the ImageNet challenge [1]. This and subsequent work has used very large neural network architectures, notably their depth, with parameter estimation carried out using equally large datasets. It is common in current computer vision literature to train models with tens of millions of parameters and use datasets of similar sizes. Much algorithmic development to control the complexity of such massive models and to incorporate techniques to handle systematic variability has been developed. Our curiosity about mammalian vision [2, 3] and commercial applications such as self-driving cars and robot navigation [4, 5] has driven the computer vision field. The interest in automatic diagnosis has reached a level of comparing artificial intelligence-based methods against human clinicians [6, 7]. However, compared with natural images, the application of deep learning in the medical domain poses more challenges, such as causality [8], uncertainty [9], and the need to integrate clinical information along with features extracted from images [10]. A particular issue with image-based inference in the medical field is data availability [11]. Often, the number of images available in the medical domain is orders of magnitude smaller than what is state-of-the-art in computer vision. Compared with other domains, due to privacy concerns and the prevalence of adverse medical conditions, most of the medical datasets only contain thousands or even hundreds of images, such as brain imaging [12].
The focus of this paper is on data sparsity/scarcity. Naturally, if we had access to hundreds of thousands of labelled medical images, as might be the case with X-rays and optometry, training a deep neural network from scratch using all the recent methodological advances is the way forward. When the number of images is in the thousands, the strategy of transfer learning is suitable for the medical data by fine-tuning the weights generated from pre-trained natural images. While the scheme is appealing, available empirical evidence for transfer learning is contradictory in the medical field. For example, on a chest X-rays problem, Raghu et al. [13] found no significant improvement with the popular ResNet trained on ImageNet as source architecture; more positive results are reported for endoscopy image recognition [14]. Another example may be the weakly supervised learning methods [15], whose performance is yet to be seen in medical diagnosis.
Our interest is in a regime of even smaller amounts of data than that is needed to fine-tune a pre-trained model with transfer learning. This regime is referred to as “few-shot learning” [16–18], and is appropriate for dataset sizes of the order of a few hundred or even down to a few tens [19, 20]. Few-shot learning works can be divided into different categories—data, model and algorithm [21]. Most contemporary few-shot learning techniques for natural images rely on methods and algorithms with fine-tuned parameters based on available data [22, 23], such as bidirectional pyramid architectures and multi-level attention pyramids to enhancing feature representations and reducing background noise [24]. Advanced frameworks like M3Net utilize multi-view encoding, matching, and fusion to handle high intra-class variance and subtle differences in actions [25], while knowledge-guided networks like KSTNet leverage auxiliary prior knowledge for better semantic-visual mapping [26]. Additionally, methods integrating background suppression and foreground alignment improve robustness in few-shot learning scenarios by addressing misalignment and reducing background interference [27]. Data augmentation technology and manifold space have also drawn some attention [28, 29]. Unlike these methods, we in this paper explore few-shot problems from the traditional machine learning perspective by using a pre-trained deep neural network as a feature extractor. In detail, each image is mapped into a fixed dimensional feature space, the dimensions of which, say M, are defined by the number of neurons in the penultimate fully connected layer of the network, typically 512 or 1024 for the popular architectures. Then we are in a regime where the number of items of data, say N, is comparable to or even smaller than the dimension of the feature space (i.e., the N < M problem in statistical inference language [30]), necessitating techniques for dimensionality reduction.
Subspace methods for reducing the dimensionality of data have a long and rich history. They fall under the group of methods known as structured low-rank approximation methods [31–34]. The basic intuition is a data matrix, , consisting of N items of data in M dimensional features, is usually not full rank. This is due to correlations along either of the axes. In the medical context, profiles of patients (i.e. data) may show strong similarities. Along the features axis, some features that have been gathered may be derivable from others. In these situations, we can find low-rank approximations by factorising Y, and additionally impose structural constraints on the factors either from prior knowledge or for mathematical convenience. Popular approaches like principal component analysis (PCA) [35] and non-negative matrix factorization (NMF) [36, 37] impose orthogonality and non-negativity constraints on the factors, respectively. Returning to few-shot learning with pre-trained deep neural network as feature extractors and encountering N < M problems, pattern recognition problems are known to suffer the “curse of dimensionality”. Hence dimensionality reduction techniques are required. The most popular technique used hitherto in the literature is PCA, implemented via singular value decomposition (SVD) [38, 39]. Despite its popularity, PCA has a fundamental weakness in that it is a variance-preserving low-rank approximation technique, more suitable for data that is uni-modal and Gaussian distributed. In the case of classification problems, however, the feature space is necessarily multi-modal with at least as many modes as the number of classes in the problem.
The basic premise of this work is the need for dimensionality reduction in the feature space and that SVD ignores multi-modal data structure. We, for the first time, usher in and explore two alternatives—discriminant analysis (DA) subspace and NMF subspace—to SVD in few-shot learning on medical imaging with multi-modal structure in the data. The DA subspace introduces the well-known Fisher linear discriminant analysis (FDA) and its multi-dimensional extensions [40–42]. The NMF and the supervised NMF (SNMF) [43] (where class label information can be injected into the factorization loss function) subspaces focus on the part-based representation with sparsity. A detailed comparison between these subspace representations, including feature selection techniques [44], is conducted. Validating on 14 datasets spanning 11 medical classification tasks with four distinct imaging modalities, we achieve statistically significant improvements in classification accuracy in the subspaces compared to the original high-dimensional feature space, with persuasive results on DA and NMF subspaces as viable alternatives to SVD.
The remainder of this paper is organized as follows. In the next section, we mainly recall the subspace representation methods, i.e., SVD, DA and NMF subspaces. The few-shot learning methodology/scheme on subspace feature representations including the experimental settings in sufficient detail to facilitate reproduction of the work is provided in Section 3. In Section 4, we give succinct descriptions of the datasets used. Section 5 presents the key results of the experimental work. A further discussion is conducted in Section 6, followed by conclusion in Section 7. Some additional details regarding method derivations and extra results are provided in S1 File.
2 Subspace representation
2.1 Basic notations
Given N samples , we form a data matrix
, where M is the number of features of every sample. Suppose that these N samples belong to C different classes, namely Λj, and their cardinality |Λj| = Nj, 1 ≤ j ≤ C. Let
represent the k-th sample in class Λj. Clearly,
,
and
. Let
and
respectively be the mean of the whole samples and the samples in class j, i.e.,
,
, 1 ≤ j ≤ C.
Let represent the intra-class scatter for class j, i.e.,
(1)
Then the inter- and intra-class scatters, denoted as SB and SW, respectively, read
(2)
Specifically, for the binary case, i.e., C = 2, we also name and
as the inter- and intra-class scatters, i.e.,
(3)
where
and β = (N2 − 1)/(N1 + N2 − 2).
2.2 Feature selection
Feature selection is the process of extracting a subset of relevant features by eliminating redundant or unnecessary information for model development. There are several types of feature selection techniques, including supervised [45], semi-supervised [46], and unsupervised methods [47]. For example, the Boruta algorithm [48], one of the supervised feature selection methods, selects features by shuffling features of the data and calculating the feature correlations based on classification loss. The approach has also been used to classify medical images [49].
2.3 Singular value decomposition
SVD is the most common type of matrix decomposition, which can decompose either a square or rectangle matrix. The SVD of the matrix Y can be represented as Y = UΣV⊤, where and
are orthogonal matrices, and
is a diagonal matrix whose diagonal consists of singular values. The singular values are generally ordered and it is well known that in most real-world problems they reduce quickly to zero, i.e., typically the first 10% or even 1% of the largest singular values could account for more than 99% of the sum of all the singular values. Therefore, the singular vectors corresponding to the top p ≪ min{M, N} largest singular values compose the transformation matrix for the most representative subspace. Meanwhile, the variance preserving property of SVD is extremely effective in data compression and widely employed in deep learning tasks, especially when the data is uni-modal. For example, SVD has been used to compress features taken at different layers to compare learning dynamics across layers as well as across models [50].
2.4 Discriminant subspaces
It is usually possible to design logic based on the statistics of a design set that achieves a very high recognition rate if the original set of features is well chosen. Discriminant vectors for DA can reduce the error rate and solve the discrimination portion of the task [40, 51]. Since the discriminant vector transformation aims to reduce dimensionality while retaining discriminatory information, sophisticated pattern recognition techniques that were either computationally impractical or statistically insignificant in the original high-dimensional space could be possible in the new and low-dimensional space. The intuitive assumption is that features based on discrimination are better than that based on fitting or describing the data. In what follows, we present different approaches of obtaining discriminant vectors for multiclass and binary classification problems.
2.4.1 Multiclass classification problem.
The aim of the multiclass DA is to discover a linear transformation which lowers the dimensionality of an M-dimensional statistical model with C > 2 classes while keeping as most discriminant information in the lower-dimensional space as possible.
Let serve as the projection direction. In Fisher’s discriminant analysis [52], the Fisher criterion reads
(4)
which can be addressed by solving
(5)
where λ is the Lagrange multiplier [53]. This is also known as the generalized eigenvalue problem regarding SB and SW, and d is the eigenvector corresponding to the non-zero eigenvalue (λ) in this situation. Then the transformation matrix can be formed by stacking up the (C − 1) eigenvectors corresponding to the (C − 1) largest eigenvalues in Eq (5). When the number of samples N is small and/or the dimensionality of the data M is big, SW is generally singular in practice. This could be dealt with by adding a small perturbation on SW, e.g.,
(6)
where I is the identity matrix and δ is a relatively small value (e.g., 5 × 10−3) such that
is therefore invertible. The discriminant directions can then obtained by conducting the eigenvalue decomposition of
and finding the (C − 1) eigenvectors corresponding to the (C − 1) largest eigenvalues.
2.4.2 Binary classification problem.
Different from Fisher criterion given in Eq (4), which can only produce one discriminant direction in the binary classification scenario, the method proposed in [40] can discover more discriminant directions. It is optimal in the sense that a set of projection directions is determined under a variety of constraints, see details below.
The Fisher criterion (cf. Eq (4)) for the binary classification problem reads
(7)
Note that is independent of the magnitude of d. The first discriminant direction d1 is discovered by maximising
, and then we have
(8)
where α1 (i.e.,
) is the normalising constant such that ‖d1‖2 = 1 (and recall
is the difference of the means of the two classes). The second discriminant direction d2 is required to maximise
in Eq (7) and be orthogonal to d1. It can be found by the method of Lagrange multipliers, i.e., finding the stationary points of
(9)
where λ is the Lagrange multiplier. We can then obtain
(10)
where α2 is the normalising constant such that ‖d2‖2 = 1. see S1 Appendix in S1 File for the detailed derivation.
The above procedure can be extended to any number of directions (until the number of features M) recursively as follows. The n-th discriminant direction dn is required to maximise in Eq (7) and be orthogonal to dk, k = 1, 2, …, n − 1. It can be shown that
(11)
where αn is the normalising constant such that ‖dn‖2 = 1 and the (i, j) entries of
are defined as
(12)
The whole procedure of finding L number of discriminant vectors
is summarised in Algorithm 1.
Algorithm 1 LDA for binary classification
Require: and L ≤ M, i.e., the given samples and the number of discriminant vectors.
Compute
and
in Eq (3);
Compute d1 using Eq (8) and S1 using Eq (12);
n = 1;
for n < L do
n = n + 1;
Compute dn using Eq (11);
Compute Sn using Eq (12);
end for
Return
Similar to how each singular vector correlates to a singular value, each discriminant vector dn corresponds to a “discrim-value” say γn, where
(13)
The discriminant vectors
are naturally ordered by their discriminant values, following γ1 ≥ γ2 ≥ ⋯ ≥ γL ≥ 0.
The DA subspace formed by offers considerable potential for feature extraction and dimensionality reduction in many fields like pattern recognition. For example, face recognition has been enhanced by LDA [54] outperforming PCA in many cases.
2.5 Non-negative matrix factorization
In the process of matrix factorization, reconstructing a low-rank approximation for the data matrix Y is of great importance. NMF is a technique dealing with Y ≥ 0 whose entries are all non-negative [55], with great achievements in many fields such as signal processing [56], biomedical engineering, pattern recognition and image processing [57]. The sparsity of the NMF subspace has also received extensive attention. In genomics, for example, the work in [58] factorized gene expression matrices across different experimental conditions, showing that the sparsity of NMF contributes to decreasing noise and extracting biologically meaningful features. The purpose of NMF is to find two non-negative and low-rank matrices, i.e., one base matrix and one coefficient matrix
, satisfying
(14)
where p < min{M, N}. Let K = (k1, k2, ⋯, kN)⊤. We have
. In other words, every sample yi can be represented by a linear combination of the rows of X with the components in ki serving as weights. Therefore, X is also known as consisting of basis vectors which can project the data matrix Y into a low-dimensional subspace. The number of basis vectors p will affect the degree of approximation to the data matrix Y.
Finding K and X satisfying Eq (14) can be addressed by solving the following minimisation problem:
(15)
where ‖ ⋅ ‖F is the Frobenius norm. To solve problem (15), a common technique is to update K and X alternatively, i.e.,
(16)
where ⊙ denotes the pointwise product. For more algorithmic details please refer to e.g. [55].
NMF is an unsupervised method that decomposes the data matrix without utilising the class label information. Regarding the binary classification problem, the SNMF (supervised NMF) proposed in [43] extends the standard unsupervised NMF approach by exploiting feature extraction and integrating the cost function of the classification method into NMF. In SNMF, the classification labels are incorporated into the algorithms to extract the specific data patterns relevant to the respective classes. The whole algorithm of SNMF is provided in S2 Appendix in S1 File.
3 Few-shot learning on subspace representations
We deploy few-shot learning techniques in investigating medical imaging particularly in the data scarcity scenario. We consider problems in which the feature space dimensionality is usually high in comparison to the number of images we have; hence subspace representations are sought. The adopted few-shot learning scheme on subspace feature representations and experimental settings are presented in what follows.
3.1 Framework
The deployed and enhanced few-shot learning schematic diagram on different subspaces is shown in Fig 1. Firstly, a pre-trained deep neural network (e.g. ResNet18) to solve a large natural image classification problem is prepared and then used to extract features of medical images in the given datasets (i.e., the green box in Fig 1). After that the extracted features are projected to subspace representations (i.e., the blue box in Fig 1). In this paper, we consider three different methods (i.e., SVD, DA and NMF) described in Section 2 to achieve this. Finally, a classifier (e.g. the K-Nearest Neighbour (KNN) or Support Vector Machine (SVM)) is employed to perform few-shot learning—predicting the final classification results. Extensive exploration in terms of the benefits of different subspace representations and insightful suggestions and comparisons in the regime of few-shot learning in medical imaging will be conducted in Section 5.
From left to right: A pre-trained deep neural network (e.g. ResNet18) to solve a large natural image classification problem is exploited to extract features of medical images (i.e., inputs in the green box), and then the extracted features are projected to subspace representations (i.e., outputs in the blue box), followed by a classifier (e.g. KNN) delivering the classification results. The extracted features for individual images are visualised as dots with different colours representing different classes (i.e., the middle of the diagram).
3.2 Experimental settings
We explore 14 datasets covering 11 distinct diseases, with the number of classes ranging from 2 to 11, see Section 4 for more detail. The pre-trained deep model, ResNet18, is used as the source model in our experiment. Each input is pre-processed and pixel-wised by subtracting its mean and being divided by the standard deviation without data augmentation. The feature space is from the features in the penultimate layer of the pre-trained model (ResNet-18) extracted by PyTorch hooks [59], yielding a 512-dimensional feature vector for each image. The low-dimensional representations are then generated from the introduced methods. The number of iterations related to NMF and SNMF is set to 3000 to ensure convergence. The mean result of the KNN classifier with selected K (with values of 1, 5, 10 and 15) nearest neighbours is used to evaluate the final performance. Except for KNN, we also implement SVM as the classifier for comparison. The detailed experimental setting and results of the SVM classifier are shown in S3 Appendix in S1 File. To quantify the uncertainty of the classification accuracy and produce more reliable quantitative results, we present averages and standard deviations across 10 distinct times of random samplings in each dataset. In addition to the accuracy, the reconstruction error of NMF at different random initialization is conducted to demonstrate its convergence. Moreover, we also compare our method with other well-known few-shot learning algorithms like the prototypical network [60]. The experimental setup and results are presented in S3 Appendix in S1 File.
4 Data
A total of 14 different datasets covering a range of problems in diagnostics are employed for our empirical work. The number of classes ranges from 2 to 11 and the imaging modalities include X-rays, CT scans, MRI and Microscope. The datasets with MNIST within their name come from a benchmark family referred to as MedMNIST [61]. In order to illustrate the regime of few-shot learning, randomly sampled subsets of the whole individual datasets are used for our training and test. The corresponding data split for each class in the training and test sets for all the datasets is presented in Table 1. It is worth noting that our intention is not to compare with previously published results which have used the whole individual datasets. For ease of reference, brief descriptions of these individual datasets together with our implementations are given below.
- 1) BreastCancer (IDC) data [62, 63] is a binary classification problem sampled from digitised whole slide histopathology images. The source of the data is from 162 women diagnosed with Invasive Ductal Carcinoma (IDC), the most common phenotypic subtype in breast cancers. From these annotated images 277, 524 patches had been segmented. An accuracy of 84.23% using the whole dataset is reported in [63].
- 2) BrainTumor data [64, 65] is a four-category problem, consisting of 7, 022 images of human brain MRI images, three types of tumours (i.e., glioma, meningioma and pituitary), and a control group.
- 3) CovidCT data [66] is a binary classification problem, which is of great interest due to the COVID-19 pandemic. It contains 349 CT scans that are positive for COVID-19 and 397 negatives that are normal or contain other types of diseases. Two-dimensional slices from the scans are used in the study.
- 4) DeepDRiD data [67] is a five-category problem. Diabetic retinopathy is a prevalent eyesight condition in eye care. With early detection and treatment, the majority of these disorders may be controlled or cured. In this dataset, a total of 2, 000 regular fundus images were acquired using Topcon digital fundus camera from 500 patients.
- 5) BloodMNIST data [68] is an eight-category problem, including a total of 17, 092 images. It consists of individual normal cells, captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the time of blood collection.
- 6) BreastMNIST data [69] is a binary classification problem, including a total of 780 breast ultrasound images. An accuracy of 94.62% is claimed in [70] in the computer-aided diagnostic (CAD) setting on the whole dataset. The grayscale images are replicated in order to match the pre-trained model.
- 7) DermaMNIST data [71, 72] is a multi-source dermatoscopic image collection of common pigmented skin lesions. It contains 10, 015 dermatoscopic images, which are classified into seven diseases.
- 8) OCTMNIST data [73] is for retinal diseases, including a total of 109, 309 valid optical coherence tomography images, with four diagnostic categories.
- 9) OrganAMNIST, OrganCMNIST and OrganSMNIST datasets [74] are eleven-category problem. They are benchmarks for segmenting liver tumours from 3D computed tomography images (LiTS). Organ labels were obtained using boundary box annotations of the 11 bodily organs studied, which are renamed from Axial, Coronal and Sagittal for simplicity. Grayscale images were converted into RGB images through the instruction in [61].
- 10) PathMNIST data [75] is based on the study of using colorectal cancer histology slides to predict survival, including a total of 107,180 images and nine different types of tissues. An accuracy of 94% was achieved in [75] by training a CNN using transfer learning on a set of 7, 180 images from 25 CRC patients.
- 11) PneumoniaMNIST data [73] is to classify pneumonia into two categories—severe and mild. It consists of 5, 856 paediatric chest X-ray images. The source images are grayscale, which are converted to RGB for training in the same manner as the OrganAMNIST dataset.
- 12) TissueMNIST data [76] is derived from the Broad Bioimage Benchmark Collection. It consists of 236, 386 human kidney cortex cells, segmented and labelled into eight categories. An accuracy of 80.26% was achieved in [76] using a custom 3D CNN on the whole dataset.
5 Experimental results
In this section, we investigate the performance of the few-shot learning scheme described in Section 3 on subspace representations using SVD, DA and NMF. Note, importantly, that our main interest is to introduce DA and NMF as alternative subspace representations to SVD in the regime of few-shot learning in medical imaging. In addition to the comparison between the SVD, DA and NMF subspaces, we also compare them with other relevant feature selection, dimensionality reduction, and few-shot learning methods. For visual inspection, we visualise the subspace distributions of SVD, DA and NMF by T-SNE built-in function in Python (see the results in S3 Appendix in S1 File).
5.1 Discriminant versus principal component subspaces
We first conduct comparison between DA and PCA. Table 2 shows the few-shot learning classification accuracy on the 14 datasets/problems, comparing the feature space in its original dimension of the ResNet18 with the PCA and DA subspaces. The accuracy results are the average of K values of KNN classifier chosen to be 1, 5, 10 and 15. We note that with a single exception of the CovidCT dataset, principal component dimensionality reduction loses information about class separation, whereas the discriminant subspace representation maintains the separation extremely well, thereby showing significant improvement over the original feature space. In detail, in 11 of the 14 problems, the SVD subspace performs worse than the original feature space. In contrast, the DA subspace shows significant improvement over the corresponding SVD subspace in all the 14 problems; and in 13 of the 14 problems, the DA subspace shows significant improvement over the original feature space. Furthermore, Z-test was also carried out and it is confirmed that the results are statistically significant at P values smaller than 10−3.
We now evaluate the impact of the subspace dimensions on the classification accuracy for DA and SVD. Fig 2(a) shows how the classification accuracy varies as the subspace dimensions increase on the PneumoniaMNIST dataset (consistent results are observed for other datasets). In particular, ten different random partitions of the training-test set are utilised to shuffle the data (which will make the results more credible) and dimensions from one to ten are investigated in Fig 2(a). We observe that the performance of both the DA and SVD methods increases monotonically corresponding to the number of dimensions, with the DA subspace consistently outperforming SVD. Given the performance achieved using the full set of features is 70.43 ± 3.70 in Table 2, hence the increase for SVD is not sustainable beyond this point.
Fig 2(a) shows the DA subspace taken at different dimensions consistently outperform the SVD subspace (cf. Table 2 for the performance on the full 512 dimensional feature space). Fig 2(b) shows the excellent performance of the DA subspace against PCA and the original feature space, irrespective of the choice of K in the classifier.
The effect of different neighbourhood size K of the KNN classifier is reported in Fig 2(a), where the eleven-class dataset OrganAMNIST (consistent results are observed for other datasets) is used. Moreover, the performance of the SVD and DA subspaces with dimension equal to ten against the original feature space corresponding to K = 1, 5 and 10 is evaluated in Fig 2(b). Uncertainty in results is evaluated over 10 random partitions of the training-test set, with 550 and 165 images for training and test, respectively. Fig 2(b) shows substantial improvement in DA subspace representation over both the original feature space and the SVD reduced subspace irrespective of the choice of K in the KNN classifier.
Finally, we investigate the effect of the dataset size on the performance of the methods compared. Fig 3 shows the results regarding the DA and PCA subspaces and the original feature space on a small subset (i.e., 540 and 180 images for training and test, respectively) of the dataset as well as the entire dataset (i.e., 70,974 and 3,051 images for training and test, respectively), where nine-class dataset PathMNIST (consistent results are observed for other datasets) is used for illustration. The value K in the KNN classifier is set to 5. In Fig 3, we also evaluate the effect of the pre-trained model on ImageNet versus the model whose weights are defined by random initialization. The findings reveal that the performance of the DA subspace always outperforms the SVD and the original feature space, irrespective of the choice of the data size. Particularly, it also shows that, although utilising only 0.7% of the entire dataset, the results achieved using the DA subspace are highly comparable to those obtained using the entire dataset, whereas the results of SVD fall short. This confirms that the DA subspace is more stable than the SVD subspace, providing a discriminative subspace ideal for classification problems. In passing, we also see that the performance of the pre-trained model is better than that of the model with randomly initialised weights, which fits our expectations. More results—the comparison between DA and the manifold learning method Isomap (a non-linear dimensionality reduction process)—on all the datasets are given in S3 Appendix in S1 File.
Dataset PathMNIST with nine classes is used. The left and right three pairs of bars in the panel are the results of the pre-trained model and the model with randomly initialised weights, respectively. The results reveal that the performance of the DA subspace always outperforms the SVD and the original feature space, irrespective of the choice of the data size. Moreover, the results achieved using the DA subspace are highly comparable to those obtained by using the entire dataset, whereas the results of SVD fall short.
5.2 Non-negative matrix factorization subspace
The classification accuracy of the NMF subspace (including NMF and SNMF) and the comparison with the SVD subspace and the original feature space on the binary class and multiclass problems are shown in Tables 3 and 4 respectively. The SNMF subspace is only limited to the binary class problem and the dimension of related subspaces is kept as 30. It shows that, generally, the subspace representations (either SVD or NMF) deliver better performance than the original feature space. With SNMF marginally outperforming NMF in binary classification tests, NMF and SVD subspace both perform comparably and the trend is also preserved in multiclass classification problems. This prompts NMF can be a viable alternative to SVD, particularly when sparse representation is of great interest.
Different from the dimension selected in the DA subspace, the dimension of the NMF/SVD subspaces is retained as 30. Mainly because the NMF (including SNMF) approximates the original data with the product of two matrices and is affected by the selected rank during decomposition. While for the DA subspace, the dimensions are determined by the number of classes for the multiclass problems. Our results show that the performance of NMF is stable only after reaching a specific dimension, which is similar to the selection of the number of eigenvectors in SVD. Detailed trends regarding the performance of NMF and SVD subspaces on the 14 datasets against the changes in dimension are presented in S3 Appendix in S1 File, including the comparison with the non-linear dimensionality reduction method Isomap in S3 Fig 8 in S1 File.
Additionally, we investigate the stability and uncertainty of NMF from the viewpoints of dataset size and the effects of random NMF initialization in various dimensions, respectively. Fig 4 describes how the volume of datasets influences the classification performance as the subspace dimensions ranging from 5 to 65 on the BrainTumor dataset. Two training datasets with the size of 320 and 640 images are created for the SVD and NMF subspaces, represented by different colour bars. It shows that on the big dataset (with 640 images), SVD and NMF are quite similar (see the blue and purple bars). On the small dataset (with 320 images), the NMF subspace outperforms the SVD subspace (see the red and green bars). SVD suffers from dimension issues in the small dataset since it performs gradually worse rather than better when the dimension becomes higher (e.g. when the dimension increases from 15 to 65). In contrast, the results of the NMF subspace are relatively stable in different dimensions and have similar accuracy. Although NMF behaves not good in extremely low dimensions (such as 5 dimensions), it gets improved as the dimension increases, which is consistent with the statement mentioned before. The uncertainty of NMF is evaluated by randomly initialising the NMF corresponding to different dimensions. In Fig 5, the left and right images show the reconstruction error and the classification performance with 20 random NMF initializations on the BrainTumor dataset. It reveals that the reconstruction error decreases as the dimensionality increases and the performance of NMF is quite stable corresponding to different dimensions with random initialization.
Dataset BrainTumor with four classes is used. Uncertainty is evaluated over 5 random partitions of the training-test set; and two types of training datasets with 320 and 640 images are created. The value K = 5 is used in the KNN classifier. It shows that the performance of the NMF is stable for both types of datasets, whereas SVD suffers dimensional issues in the small dataset (with 320 images).
The left and right panels respectively show the reconstruction error and the classification performance with 20 random NMF initializations on the BrainTumor dataset, indicating that the performance of NMF is quite stable corresponding to different dimensions with random initialization except for the 5-dimensional subspace.
5.3 Role of the feature extractor
In the few-shot learning paradigm considered, the pre-trained source model serves as a feature extractor, mapping the medical images into a high dimensional space. To explore the impact of parameters in the model, we compare the classification accuracy from the related subspaces (i.e., feature space, PCA, DA and NMF) in random initialization and pre-trained models. Fig 6 shows the performance of the pre-trained model and the average of ten random initialization models on all the 14 datasets. ResNet18 is used as the base feature extractor with various parameters in this experiment. As we expected, the features extracted by the pre-trained model retain the good discriminant properties. Surprisingly, the performance of the features extracted by the randomly initialized model and the corresponding subspaces is not significantly degraded, indicating that the same discriminative properties are properly preserved in its extracted features. The DA results in the figure further illustrate this point and prove that subspace perspective provides directions for solving the few-shot learning on medical imaging.
Information extracted from the pre-trained source models helps in downstream medical tasks, although the fixed random transformations also retain discriminant information.
5.4 Boruta subspace
To investigate the performance of feature selection techniques in the few-shot learning framework, we below compare the subspace extracted from the Boruta feature selection method with the dimensionality reduction methods (i.e., SVD, DA and NMF). We follow the Boruta method and extract the related features on the 14 medical datasets (see results in S3 Appendix in S1 File. Fig 7 presents the classification results comparing the Boruta feature selection method against DA and NMF. It shows that feature selection, like the Boruta method which only selects a subset from the 512 dimensions based on the voting results of a wrapper algorithm around a random forest, generally is not a good choice for the few-shot learning architecture we present. Instead of selecting features randomly like Boruta, we prefer to conditionally maintain the original attributes (e.g. discriminability, sparsity and non-negativity) of the data in the subspace. In addition, in terms of the computation time, DA and NMF is dramatically faster than Boruta (needing a high number of iterations), showing the efficiency of the introduced subspace representations.
6 Discussion
For few-shot learning with only hundreds of images, comparable in order of magnitude to the feature dimensions (typically 512 or 1024 of popular models), dimensionality reduction is essential. While popular method of dimensionality reduction is PCA/SVD, its limitations as a variance preserving approximation suitable for uni-modal data need to be considered. We have addressed this by exploring DA and NMF as alternatives to SVD for few-shot learning in medical imaging.
By presenting the results in the experiment section, we discovered that the subspace obtained by DA is more useful for classification problems than the variance-preserving dimensionality reduction PCA/SVD. DA performs well on multiple disease datasets and effectively distinguishes the classes of disease in the low-dimensional space. However, DA also has some limitations, e.g. the maximum dimension of its subspace is one less than the number of classes for multiclass problems. This limitation is related to the rank of the covariance generated by the dataset. Moreover, DA may not perform ideally with classification when the data information depends on variance rather than the mean.
We also restricted our work on SNMF (supervised NMF) to binary classification problems for which the derivation is readily available. While for multiclass problems, more attempts will be necessary. This is mainly due to the fact that NMF is an inherently unsupervised matrix factorization algorithm and how to properly combine label signals and generate discriminate subspaces remain to be discussed. These, however, do not limit the scope of the conclusions we reach regarding the desirability of alternatives to the widely used SVD. Future work could be focusing on deriving the solutions to these cases. Additionally, it is also interesting to explore automatic rank selection using information theoretic concepts such as minimum description length considered in [77].
The comparison between feature selection techniques e.g. [44, 49] and the dimensionality reduction (i.e., SVD, DA and NMF) reveals that just selecting some specific features is less effective than eliminating less relevant information via dimensionality reduction. Moreover, plain feature selection can be quite unstable and may also be time-consuming. In comparison, since our few-shot learning architecture uses a pre-trained network for feature extraction, it is quite efficient. Most of the time consumed by our few-shot learning architecture is the dimensionality reduction and classification with a simple classifier. Benefiting from the dimensionality reduction, the final classification step is also quite economical.
Finally, it is worth mentioning that in clinical settings the validation and accuracy evaluation of the developed technique in medical imaging are extremely challenging (which is also true for all the related techniques). This is far beyond the lack of data challenge since clinical settings may require the involvement of clinicians, hospitals, patients and even the government, which are all difficult to reach out for individual academics or research groups. Collective effort from all interests is essential to validate/evaluate the practical use of any new method in medical imaging. In this study, our primary focus is the evaluation of our approach across 14 publicly accessible medical datasets, accompanied by a thorough presentation of experimental analysis. In the future, it is of great interest to apply the developed approaches from this work to more medical datasets.
7 Conclusion
In this paper, we explored two different subspace representations—DA and NMF—of features learned from deep neural networks pre-trained on large computer vision datasets, adopted for few-shot learning on small medical imaging datasets. Our empirical work is carried out on 14 different datasets spanning 11 distinct diseases and four image acquisition modalities. Across these, we demonstrate the following: I) there is a consistent performance advantage on dimensionality reduction in the few-shot learning on medical imaging; II) working with DA derived subspaces gives significant performance gains over PCA/SVD based variance preserving dimensionality reductions, and even when taken at very low dimensions, these gains are statistically significant; and III) NMF-based representation, including its supervised variation, is a viable alternative to SVD-based low dimensional subspaces. NMF also shows a comparable advantage on part-based representation in moderate low dimensions. Moreover, DA is particularly effective in scenarios where maximizing class separability is crucial, such as in classification tasks with significant class overlap. NMF, based on the non-negative requirements for the input and output, is well-suited for applications requiring interpretability and part-based representations, making it beneficial in fields like biomedical engineering and signal processing. Additionally, NMF can be used in both supervised and unsupervised settings, providing flexibility in its application depending on the availability of medical data. Overall, the developed few-shot learning framework with the newly introduced subspace representations is a very powerful approach in tackling medical imaging multiclass classification problems. One of important future avenues could be extending the developed approaches in this work in other fields.
References
- 1. Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- 2.
Fukushima Kunihiko and Miyake Sei. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets, pages 267–285. Springer, 1982.
- 3. Pinto Nicolas, Cox David D, and DiCarlo James J. Why is real-world visual object recognition hard? PLoS computational biology, 4(1):e27, 2008. pmid:18225950
- 4. Gupta Abhishek, Anpalagan Alagan, Guan Ling, and Khwaja Ahmed Shaharyar. Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues. Array, 10:100057, 2021.
- 5.
Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J Lim, Abhinav Gupta, Li Fei-Fei, et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364. IEEE, 2017.
- 6. Liu Xiaoxuan, Faes Livia, Kale Aditya U, Wagner Siegfried K, Fu Dun Jack, Bruynseels Alice, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The lancet digital health, 1(6):e271–e297, 2019. pmid:33323251
- 7. Pesapane Filippo, Codari Marina, and Sardanelli Francesco. Artificial intelligence in medical imaging: threat or opportunity? radiologists again at the forefront of innovation in medicine. European radiology experimental, 2(1):1–10, 2018. pmid:30353365
- 8. Castro Daniel C, Walker Ian, and Glocker Ben. Causality matters in medical imaging. Nature Communications, 11(1):1–10, 2020. pmid:32699250
- 9. Kendall Alex and Gal Yarin. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems, 30, 2017.
- 10. Lundervold Alexander Selvikvåg and Lundervold Arvid. An overview of deep learning in medical imaging focusing on mri. Zeitschrift für Medizinische Physik, 29(2):102–127, 2019. Special Issue: Deep Learning in Medical Physics. pmid:30553609
- 11.
Kim Mijung, Zuallaert Jasper, and Neve Wesley De. Few-shot learning using a small-sized dataset of high-resolution fundus images for glaucoma diagnosis. In Proceedings of the 2nd international workshop on multimedia for personal health and health care, pages 89–92, 2017.
- 12. Althnian Alhanoof, AlSaeed Duaa, Al-Baity Heyam, Samha Amani, Dris Alanoud Bin, Alzakari Najla, et al. Impact of dataset size on classification performance: An empirical evaluation in the medical domain. Applied Sciences, 11(2):796, 2021.
- 13. Raghu Maithra, Zhang Chiyuan, Kleinberg Jon, and Bengio Samy. Transfusion: Understanding transfer learning for medical imaging. Advances in neural information processing systems, 32, 2019.
- 14. Caroppo Andrea, Leone Alessandro, and Siciliano Pietro. Deep transfer learning approaches for bleeding detection in endoscopy images. Computerized Medical Imaging and Graphics, 88:101852, 2021. pmid:33493998
- 15. Zhang Dingwen, Han Junwei, Cheng Gong, and Yang Ming-Hsuan. Weakly supervised object localization and detection: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5866–5885, 2021.
- 16.
Hsin-Ping Huang, Krishna C Puvvada, Ming Sun, and Chao Wang. Unsupervised and semi-supervised few-shot acoustic event classification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 331–335. IEEE, 2021.
- 17.
Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, and Jinhui Tang. Few-shot image recognition with knowledge transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 441–449, 2019.
- 18. Li Xiaoxu, Yang Xiaochen, Ma Zhanyu, and Xue Jing-Hao. Deep metric learning for few-shot image classification: A review of recent developments. Pattern Recognition, page 109381, 2023.
- 19. Argüeso David, Picon Artzai, Irusta Unai, Medela Alfonso, San-Emeterio Miguel G, et al. Few-shot learning approach for plant disease classification using images taken in the field. Computers and Electronics in Agriculture, 175:105542, 2020.
- 20. Quellec Gwenolé, Lamard Mathieu, Conze Pierre-Henri, Massin Pascale, and Cochener Béatrice. Automatic detection of rare pathologies in fundus photographs using few-shot learning. Medical image analysis, 61:101660, 2020. pmid:32028213
- 21. Wang Yaqing, Yao Quanming, Kwok James T, and Ni Lionel M. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
- 22. Das Debasmit and George Lee C. S. A two-stage approach to few-shot learning for image recognition. IEEE Transactions on Image Processing, 29:3336–3350, 2020.
- 23. Zhou Xiaokang, Liang Wei, Shimizu Shohei, Ma Jianhua, and Jin Qun. Siamese neural network based few-shot learning for anomaly detection in industrial cyber-physical systems. IEEE Transactions on Industrial Informatics, 17(8):5790–5798, 2021.
- 24. Tang Hao, Yuan Chengcheng, Li Zechao, and Tang Jinhui. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition, 130:108792, 2022.
- 25.
Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, and Jinhui Tang. M3net: multi-view encoding, matching, and fusion for few-shot fine-grained action recognition. In Proceedings of the 31st ACM international conference on multimedia, pages 1719–1728, 2023.
- 26. Li Zechao, Tang Hao, Peng Zhimao, Qi Guo-Jun, and Tang Jinhui. Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- 27. Zha Zican, Tang Hao, Sun Yunlian, and Tang Jinhui. Boosting few-shot fine-grained recognition with background suppression and foreground alignment. IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- 28.
Debasmit Das, JH Moon, and George Lee. Few-shot image recognition with manifolds. In International Symposium on Visual Computing, pages 3–14. Springer, 2020.
- 29. Yin Wanguang, Ma Zhengming, and Liu Quanying. Discriminative subspace learning via optimization on riemannian manifold. Pattern Recognition, 139:109450, 2023.
- 30. Tibshirani Robert. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
- 31.
Markovsky Ivan. Low rank approximation: algorithms, implementation, applications, volume 906. Springer, 2012.
- 32. Shetta Omar, Niranjan Mahesan, and Dasmahapatra Srinandan. Convex multi-view clustering via robust low rank approximation with application to multi-omic data. IEEE/ACM transactions on computational biology and bioinformatics, 2021.
- 33. Markovsky Ivan and Niranjan Mahesan. Approximate low-rank factorization with structured factors. Computational Statistics & Data Analysis, 54(12):3411–3420, 2010.
- 34. Liang Zhizheng and Shi Pengfei. An analytical algorithm for generalized low-rank approximations of matrices. Pattern Recognition, 38(11):2213–2216, 2005.
- 35.
Dimitris Papailiopoulos, Alexandros Dimakis, and Stavros Korokythakis. Sparse pca through low-rank approximations. In International Conference on Machine Learning, pages 747–755. PMLR, 2013.
- 36. Lee Daniel D and Seung H Sebastian. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999. pmid:10548103
- 37. Hedjam Rachid, Abdesselam Abdelhamid, and Melgani Farid. Nmf with feature relationship preservation penalty term for clustering problems. Pattern Recognition, 112:107814, 2021.
- 38. Wu Aming, Zhao Suqi, Deng Cheng, and Liu Wei. Generalized and discriminative few-shot object detection via svd-dictionary enhancement. Advances in Neural Information Processing Systems, 34, 2021.
- 39.
He Zhang and Lili Liang. Res-svdnet: A metric learning method for few-shot image classification. In 2021 40th Chinese Control Conference (CCC), pages 7400–7405, 2021.
- 40. Foley Donald H. and Sammon John W. An optimal set of discriminant vectors. IEEE Transactions on computers, 100(3):281–289, 1975.
- 41. Song Fengxi, Liu Shuhai, and Yang Jingyu. Orthogonalized fisher discriminant. Pattern Recognition, 38(2):311–313, 2005.
- 42. Xu Yong, Yang Jing-Yu, and Jin Zhong. A novel method for fisher discriminant analysis. Pattern Recognition, 37(2):381–384, 2004.
- 43. Leuschner Johannes, Schmidt Maximilian, Fernsel Pascal, Lachmund Delf, Boskamp Tobias, and Maass Peter. Supervised non-negative matrix factorization methods for maldi imaging applications. Bioinformatics, 35(11):1940–1947, 2019. pmid:30395171
- 44. Li Zechao, Liu Jing, Tang Jinhui, and Lu Hanqing. Robust structured subspace learning for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10):2085–2098, 2015. pmid:26353186
- 45. Huang Samuel H. Supervised feature selection: A tutorial. Artif. Intell. Res., 4(2):22–37, 2015.
- 46. Li Zechao and Tang Jinhui. Semi-supervised local feature selection for data classification. Science China Information Sciences, 64(9):1–12, 2021.
- 47. Li Zechao and Tang Jinhui. Unsupervised feature selection via nonnegative spectral analysis and redundancy control. IEEE Transactions on Image Processing, 24(12):5343–5355, 2015. pmid:26394422
- 48. Kursa Miron B and Rudnicki Witold R. Feature selection with the boruta package. Journal of statistical software, 36:1–13, 2010.
- 49.
Rong Tang and Xiaojun Zhang. Cart decision tree combined with boruta feature selection for medical data classification. In 2020 5th IEEE International Conference on Big Data Analytics (ICBDA), pages 80–84. IEEE, 2020.
- 50. Raghu Maithra, Gilmer Justin, Yosinski Jason, and Sohl-Dickstein Jascha. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. Advances in neural information processing systems, 30, 2017.
- 51. Jing Xiao-Yuan, Zhang David, and Jin Zhong. Improvements on the uncorrelated optimal discriminant vectors. Pattern Recognition, 36(8):1921–1923, 2003.
- 52. Fisher Ronald A. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936.
- 53.
Boyd Stephen, Boyd Stephen P, and Vandenberghe Lieven. Convex Optimization. Cambridge university press, 2004.
- 54.
Fatma Zohra Chelali, A Djeradi, and R Djeradi. Linear discriminant analysis for face recognition. In 2009 International Conference on Multimedia Computing and Systems, pages 1–10. IEEE, 2009.
- 55.
Lee Daniel and Sebastian Seung H. Algorithms for non-negative matrix factorization. In Leen T., Dietterich T., and Tresp V., editors, Advances in Neural Information Processing Systems, volume 13. MIT Press, 2001.
- 56. Huang Yuwen, Yang Gongping, Wang Kuikui, Liu Haiying, and Yin Yilong. Robust multi-feature collective non-negative matrix factorization for ecg biometrics. Pattern Recognition, 123:108376, 2022.
- 57. Chen Zhikui, Jin Shan, Liu Runze, and Zhang Jianing. A deep non-negative matrix factorization model for big data representation learning. Frontiers in Neurorobotics, page 93, 2021. pmid:34354579
- 58. Brunet Jean-Philippe, Tamayo Pablo, Golub Todd R., and Mesirov Jill P. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences, 101(12):4164–4169, 2004. pmid:15016911
- 59.
Pytorch, forward and backward function hooks—pytorch documentation.
- 60. Snell Jake, Swersky Kevin, and Zemel Richard. Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems, 30, 2017.
- 61.
Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, et al. Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. arXiv preprint arXiv:2110.14795, 2021.
- 62. Janowczyk Andrew and Madabhushi Anant. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of pathology informatics, 7, 2016. pmid:27563488
- 63.
Cruz-Roa Angel, Basavanhally Ajay, González Fabio, Gilmore Hannah, Feldman Michael, Ganesan Shridar, et al. Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In Medical Imaging 2014: Digital Pathology, volume 9041, page 904103. SPIE, 2014.
- 64.
Jun Cheng. brain tumor dataset, Apr 2017.
- 65. Cheng Jun, Huang Wei, Cao Shuangliang, Yang Ru, Yang Wei, Yun Zhaoqiang, et al. Enhanced performance of brain tumor classification via tumor region augmentation and partition. PloS one, 10(10):e0140381, 2015. pmid:26447861
- 66. He Xuehai, Yang Xingyi, Zhang Shanghang, Zhao Jinyu, Zhang Yichen, Xing Eric, et al. Sample-efficient deep learning for covid-19 diagnosis based on ct scans. medrxiv, 2020.
- 67.
The 1st diabetic retinopathy – classification of fundus images according to the severity level of diabetic retinopathy.
- 68. Acevedo Andrea, Merino Anna, Alférez Santiago, Molina Ángel, Boldú Laura, and Rodellar José. A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data in Brief, ISSN: 23523409, Vol. 30, (2020), 2020. pmid:32346559
- 69. Al-Dhabyani Walid, Gomaa Mohammed, Khaled Hussien, and Fahmy Aly. Dataset of breast ultrasound images. Data in brief, 28:104863, 2020. pmid:31867417
- 70. Moon Woo Kyung, Lee Yan-Wei, Ke Hao-Hsiang, Lee Su Hyun, Huang Chiun-Sheng, and Chang Ruey-Feng. Computer-aided diagnosis of breast ultrasound images using ensemble learning from convolutional neural networks. Computer methods and programs in biomedicine, 190:105361, 2020. pmid:32007839
- 71. Tschandl Philipp, Rosendahl Cliff, and Kittler Harald. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1):1–9, 2018. pmid:30106392
- 72.
Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
- 73. Kermany Daniel S, Goldbaum Michael, Cai Wenjia, Valentim Carolina CS, Liang Huiying, Baxter Sally L, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5):1122–1131, 2018. pmid:29474911
- 74.
Patrick Bilic, Patrick Ferdinand Christ, Eugene Vorontsov, Grzegorz Chlebus, Hao Chen, Qi Dou, et al. The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056, 2019.
- 75. Kather Jakob Nikolas, Krisam Johannes, Charoentong Pornpimol, Luedde Tom, Herpel Esther, Weis Cleo-Aron, et al. Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS medicine, 16(1):e1002730, 2019. pmid:30677016
- 76. Woloshuk Andre, Khochare Suraj, Almulhim Aljohara F, McNutt Andrew T, Dean Dawson, Barwinska Daria, et al. In situ classification of cell types in human kidney tissue using 3d nuclear staining. Cytometry Part A, 99(7):707–721, 2021. pmid:33252180
- 77. Squires Steven, Prügel-Bennett Adam, and Niranjan Mahesan. Rank selection in nonnegative matrix factorization using minimum description length. Neural computation, 29(8):2164–2176, 2017. pmid:28562212