Detection of breast cancer by ATR-FTIR spectroscopy using artificial neural networks

Rock Christian Tomas; Anthony Jay Sayat; Andrea Nicole Atienza; Jannah Lianne Danganan; Ma. Rollene Ramos; Allan Fellizar; Kin Israel Notarte; Lara Mae Angeles; Ruth Bangaoil; Abegail Santillan; Pia Marie Albano

doi:10.1371/journal.pone.0262489

Abstract

In this study, three (3) neural networks (NN) were designed to discriminate between malignant (n = 78) and benign (n = 88) breast tumors using their respective attenuated total reflection Fourier transform infrared (ATR-FTIR) spectral data. A proposed NN-based sensitivity analysis was performed to determine the most significant IR regions that distinguished benign from malignant samples. The result of the NN-based sensitivity analysis was compared to the obtained results from FTIR visual peak identification. In training each NN models, a 10-fold cross validation was performed and the performance metrics–area under the curve (AUC), accuracy, positive predictive value (PPV), specificity rate (SR), negative predictive value (NPV), and recall rate (RR)–were averaged for comparison. The NN models were compared to six (6) machine learning models–logistic regression (LR), Naïve Bayes (NB), decision trees (DT), random forest (RF), support vector machine (SVM) and linear discriminant analysis (LDA)–for benchmarking. The NN models were able to outperform the LR, NB, DT, RF, and LDA for all metrics; while only surpassing the SVM in accuracy, NPV and SR. The best performance metric among the NN models was 90.48% ± 10.30% for AUC, 96.06% ± 7.07% for ACC, 92.18 ± 11.88% for PPV, 94.19 ± 10.57% for NPV, 89.04% ± 16.75% for SR, and 94.34% ± 10.54% for RR. Results from the proposed sensitivity analysis were consistent with the visual peak identification. However, unlike the FTIR visual peak identification method, the NN-based method identified the IR region associated with C–OH C–OH group carbohydrates as significant. IR regions associated with amino acids and amide proteins were also determined as possible sources of variability. In conclusion, results show that ATR-FTIR via NN is a potential diagnostic tool. This study also suggests a possible more specific method in determining relevant regions within a sample’s spectrum using NN.

Citation: Tomas RC, Sayat AJ, Atienza AN, Danganan JL, Ramos MR, Fellizar A, et al. (2022) Detection of breast cancer by ATR-FTIR spectroscopy using artificial neural networks. PLoS ONE 17(1): e0262489. https://doi.org/10.1371/journal.pone.0262489

Editor: David Mayerich, University of Houston, UNITED STATES

Received: June 17, 2021; Accepted: December 27, 2021; Published: January 26, 2022

Copyright: © 2022 Tomas et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper. Furthermore, we have uploaded and made our source code and minimal dataset available in GitHub. One may be able to access the files at https://github.com/rvtomas1/breast-cancer-ftir-codes-and-data.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Breast cancer remains the most prevalent cancer among women. Biennial mammography has been highly recommended for women 50 to 74 years old for early detection of this disease. Sensitivity of mammography has been increased to 92.7% when combined with magnetic resonance imaging (MRI). Meanwhile, combination with ultrasound (US) can increase sensitivity to only 52%. Therefore, in high-risk women for whom supplemental screening is specified, MRI is recommended when possible [1]. Supplemental screening with US for women with intermediate risk and dense breasts is an option to increase cancer detection. The mammographic sensitivity for breast cancer in women with very dense breasts is 47.6% and increased to 76.1% with US screening [2]. Suspicious lesions detected during mammograms are usually biopsied to confirm or rule out breast cancer. The most common form of biopsy is the core needle biopsy (CNB), which involves the removal of a portion of a tumor for histologic evaluation. The remainder is removed later after a definitive diagnosis of cancer. However, since tissue samples are collected by three to nine passes with a biopsy needle, the incisional surgical procedures become associated with elevated incidence of lymph node metastasis and higher local recurrence rates [3]. CEA can also be analysed to screen for breast cancer. However, it lacks disease sensitivity and specificity, hence cannot be used for screening a subpopulation with high risk for malignancies, a general asymptomatic population, or for independently diagnosing cancer. CA 15–3, which are soluble forms of the transmembrane protein Mucin1 (MUC1), is said to be overexpressed in malignant breast tumors. It was suggested that CA 15–3 and CEA can be considered complementary in detecting recurrence of breast cancer. However, their sensitivity is low and independent of the majority of the prognostic parameters that may be considered before relapse [4].

The potential of using infrared spectroscopy, in particular Fourier transform infrared (FTIR) spectroscopy, has been gaining popularity for cancer diagnostics over the last few years. The distinctive spectral properties associated with the changes in chemical composition and structure of biomolecules can be recognized by FTIR spectroscopy, making it a potential diagnostic tool [5]. Hence, when cells or tissues undergo transformation from normal to cancerous, changes in the physico-chemical structures and properties in a variety of their biomolecules can be simultaneously and indiscriminately probed by FTIR spectroscopy [6, 7]. FTIR, as compared to traditional microscopic examination of hematoxylin and eosin (H&E) stained biopsies, is more rapid, cost-effective, and objective since reading is based on changes in biochemical properties instead of morphology [7]. It eliminates the possibility of intra- and inter-observer variability, which is often the problem with H&E staining. Moreover, this technology does not make use of dyes and other contrasting agents which may interfere or affect the reading. Hence, it provides more accurate and reproducible results.

The application of artificial intelligence (AI) in cancer diagnosis is no longer new; numerous studies have already been applied to address the most prevalent cancers such as lung cancer [8–10], thyroid cancer [11], ovarian cancer [12, 13] and breast cancer [14–17]. Most studies make use of image-based data such as MRI images, computed tomography (CT) scans, positron emission tomography (PET) scans, X-rays, and H&E-stained biopsy images, which is the gold standard [18]. The most successful AI implemented in these studies involves artificial neural networks (NN), in particular, convolutional neural networks (CNN) due to their proven effectiveness in processing images. Furthermore, the underlying architecture of a CNN makes it easily feasible to create visual representation, highlighting the site of malignancy within a medical image. What limits image-based AI diagnostics, however, is that they heavily rely on the detection of a visible abnormality within the scanned region. This implies that a patient may already be at an advanced stage of malignancy before possible detection. Moreover, the presence of dyes and other contrasting agents may make it difficult to apply in other laboratories an AI model trained using the procedures in one laboratory, if protocols and procedures are not well standardized.

The appeal of using FTIR data in AI is that they come in less file size and are easier to process than images, while still providing sufficiently adequate information for samples [19]. Hence, data storage costs may be minimized significantly and models utilizing such data become easier and faster to train, hence minimizing training costs. Moreover, the use of FTIR data may be able to predict the onset of cancer even before evident morphological changes [5], hence addressing the limitation of image-based AI diagnostics. However, since NNs are essentially black boxes, the underlying process involving its decision-making is inherently unknown; thus making them less appealing to use in a clinical setting. Furthermore, FTIR data are less intuitive to interpret than images, even with the assistance of AI visualizations; making them difficult to interpret. In this study, a method was formulated to address this limitation by providing a novel process of determining the most prevalent biomarkers as seen by trained artificial NNs; hence providing a basis on decision-making process. The proposed method is a modification of NN perturbation-based sensitivity analysis [20–22], which probes a NN’s sensitivity towards changes in an input variable.

Hence, this study showed the potential of artificial neural networks (ANN) in accurately diagnosing breast cancer through infrared spectroscopy. Specifically, it designed multiple ANN to diagnose malignancy from breast tissues using ATR-FTIR data. The classification performance of the NN models were compared to six (6) most widely-used machine learning models. Moreover, this study also proposed a novel method for determining the IR regions which may be significant in determining breast cancer malignancy, as seen by a NN design. This proposed method may serve as a baseline process in analysing spectral data, and more importantly provide new insights and directions for pathologists and medical practitioners.

Materials and methods

Ethical clearance

Ethical clearance was obtained from the Institutional Review Board (IRB) of the University of Santo Tomas Hospital (USTH) in Manila, Philippines (Ref. No.: IRB-2018-07-135-IS) and Mariano Marcos Memorial Hospital and Medical Center (MMMH-MC) in Ilocos Norte, Philippines (Ref. No.: MMMHMC-RERC-15-006). Written informed consent from the participants or their legal guardians have been waived by the respective ethics review boards since the study was restricted to the use of archived formalin fixed paraffin embedded (FFPE) breast tissues and did not involve additional procedures nor pose risk of harm to subjects. All methods were carried out in accordance with the Declaration of Helsinki and its later amendments.

Study population and sample preparation

Two hundred (200) FFPE breast biopsies obtained from 192 adult patients seen at USTH and MMMH between January 2016 to December 2016 were included in the study. The samples were diagnosed by the resident pathologists of the hospital study sites as either benign (n = 91) or malignant (n = 101) based on microscopic examination of H&E-stained biopsies. The malignant samples were further subclassified as invasive ductal carcinoma, residual ductal carcinoma-in-situ, or invasive lobular carcinoma; and the benign samples as fibroadenoma, fibrocystic disease, benign fibroadipose tissue, fibrocollagenous cyst wall, or intraductal papillomas.

The FFPE tissues were uniformly cut at 5-μm thickness using a microtome (Leica Biosystems, Germany) and three (3) adjacent tissue sections were mounted on glass slides. The two (2) outer sections were stained with H&E for re-evaluation by a third-party pathologist who was blinded of the original diagnosis. The pathologist was instructed to classify the biopsy sample as either benign or malignant and to mark the location of the cancer cells if the sample was malignant, to serve as guide in the ATR-FTIR analysis [23]. The inner or middle tissue section was deparaffinized with xylol, dehydrated with alcohol, rinsed in distilled water, dried overnight, and subjected to ATR-FTIR analysis [24, 25].

Only the samples with similar diagnosis by the resident pathologists of the hospital study sites and the third-party pathologist were considered for further analysis. In this case, out of the 200 archival samples, only 166 (n = 78 benign; n = 88 malignant) were taken for ATR-FTIR processing. Furthermore, each of the 166 samples corresponded to one patient each to satisfy a one-is-to-one correspondence between patients and specimens [11].

ATR-FTIR spectral analysis

An FTIR mid-infrared spectrometer equipped with a platinum ATR single reflection diamond sampling module (Bruker Optics, Germany) was used to obtain spectra of the breast samples. A performance qualification (PQ) test using OPUS 8.0 software’s fully automated validation program was initially performed to ensure quality and accuracy of spectral data. The deparaffinized breast tissue sections were positioned directly in contact with the ATR diamond crystal’s surface (2 mm x 2 mm) and the mid-IR region of 4000 cm^-1 to 600 cm^-1 was passed to and from the ATR accessory. Spectra were generated at a spatial resolution of 4 cm^-1 and an average of 48 scans was co-added to obtain an adequate signal-to-noise ratio [26–28], which was further supported by the software’s validation program as “acceptable”. Prior to scanning each tissue sample, the background spectrum was recorded, and this spectrum was systematically subtracted by the software to routinely eliminate atmospheric effects. The malignant samples were scanned along the area containing the cancer cells, while the benign samples were scanned at random spots throughout their entire tissue section. The spectral data associated with a benign or a malignant tissue sample was obtained by computing for the median spectrum of their respective 48 scans.

Characterization and pre-processing of spectral data

The spectral data set, X_SD, consisted of N | N = 166 spectral vectors , comprising of 78 malignant FTIR data, , and 88 benign FTIR data, , where ℒ | ℒ = 462 denotes the length of each vector. An element of a vector , corresponds to an absorbance reading within the fingerprint region of 1800 cm^-1 to 850 cm^-1 at 2 cm^-1 steps. Furthermore, X_SD can also be characterized as a matrix of ℝ^N×ℒ dimension.

All obtained spectral data were internally normalized using z-score normalization, which is the recommended method of normalization for the FNN designs [29, 30]. The normalization was done to eliminate bias from y-value discrepancies among the IR samples. Here, normalization was done per using the equation (1) where the mean() and the std() notations denote the mean and the standard deviation of the elements of the vector . The implemented normalization scales the elements of to have an overall mean of 0, and a standard deviation of 1. X_SD also underwent baseline correction via OPUS 8.0 software via “rubber band method” with 64 baseline points. This was done to approximate a polynomial fit based on the minima of y-values of each element vector . The fitted polynomial was then deducted for all to extract the baseline corrected spectrum [25, 31–34]. Finally, the corrected spectrum was scaled within the fingerprint region, from 1800 cm^-1 to 850 cm^-1 [32, 35]. Other than baseline correction using rubber band method, no further user intervention was done to assess the spectral data. To visualize the average spectrum of benign and malignant breast tissues, their respective median values for each wavenumber was plotted.

Principal component analysis (PCA)

Principal component analysis (PCA) was performed to visualize the distribution of benign and malignant samples over two of the PCA’s most dominant components (F₁ and F₂). The process of translating X_SD to the reduced variable space, X_PCA (X_SD → X_PCA) is given by the equation (2) where X_PCA ∈ ℝ^{N × 2} is the reduced sample space, is the mean absorbance value of , and and are the eigenvectors corresponding to the largest eigenvalue of the covariance matrix . A PCA biplot of the malignant and the benign samples were drawn along the F₁ and F₂ axes to visualize the sample distribution.

Classifiers

Three (3) feed forward neural networks (FNN) of different layer sizes were designed in the study. To benchmark the NN models, six of the most widely used machine learning models were also created, in particular, linear discriminant analysis (LDA), support vector machine (SVM), logistic regression (LR), decision tree (DT), random forest (RF), and Naïve Bayes (NB). The following subsections discuss in detail the design of each model.

Cross-validation of models.

A 10-fold cross-validation procedure was used to evaluate all models; where 70% of the spectral data set X_SD, were used for the training set S_TR ⊂ X_SD, and the remaining 30% were equally partitioned for the validation set S_V ⊂ X_SD (15%) and the test set S_TS ⊂ X_SD (15%). The cross-validation procedure was repeated over 50 trials (T) to ensure stability of results [36]. For each trial, the elements of the sets S_TR, S_V, and S_TS were reselected from X_SD randomly. The sets satisfied the criteria and . Moreover, the ratio of malignant and benign samples was preserved for each set. To evaluate the performance of each model, the metrics area under the curve (AUC), accuracy (ACC), positive predictive value (PPV), negative predictive value (NPV), recall rate (RR) and specificity rate (SR) were obtained. The overall mean and standard deviation of the metrics over the 50 trials were obtained using the formulas (3.1) (3.2) in which M_m and Σ_m are the overall mean and the overall standard deviation of a metric m, where m_n,t is the metric value of a metric m, for a trial t ∈ ℕ ≤T and a fold n ∈ ℕ ≤ 10. Moreover, the variable in Eq 3.2 is the N-fold mean of a metric m, which is also equal to from Eq 3.1.

All models were designed and implemented using MATLAB R2020b on an Intel i7-6700 3.40 GHz CPU and an Nvidia GeForce GTX 1050 Ti GPU over a 16 GB RAM.

Feedforward neural networks (FNN).

Three (3) feed forward neural networks (FNN) were designed in the study. Each neural network has an input layer , consisting of ℒ nodes corresponding to the length of each spectral vector input , and an output layer , consisting of 2 nodes which correspond to the sample’s diagnosis of being benign or malignant. A neural network varies in hidden layer size (n = 2, n = 4, n = 8) with respect to the other. For all FNNs, scaled exponential linear units (SELU) were used as activation functions within the hidden layers [37], and softmax function for the output layer.

Linear discriminant analysis (LDA).

Here, S_TR was used to design an LDA model f_LDA(x) which returns the probability of a spectral vector input from being benign or malignant. S_TS was used to measure the metric performance of the model. Before constructing a linear separator among the samples, principal component analysis (PCA) was performed to reduce the dimension of data from 462 to two variables, which are F₁ and F₂. The reduction of dimensional space via PCA (X_SD → X_PCA) follows the discussion from Eq 2.

The LDA model was constructed following Fisher’s criterion [38] where the probability density function describing the likelihood of a sample from being malignant or benign is given by the function f_LDA(x) [11] (4.1) in which p_M(x) and p_B(x) are normal probability density functions describing the probability distribution of malignant and the benign samples, respectively. The normal curves are defined as (4.2) where , where Γ is the projection of X_PCA onto w_min, while μ and σ are the class mean and standard deviation, respectively.

In evaluating an element of the test set , the ℒ × 2 matrix obtained from the training set via Eq 2 was first used to translate from a dimension of ℝ^ℒ to ℝ² before evaluation using Eq 4.2.

Support vector machine (SVM).

The designed SVM is a linear SVM of input . The SVM was designed from the elements of the training set by considering an unconstrained Langrange optimization problem given by the equations [39, 40] (5.1) (5.2) (5.3) where corresponds to the equation for maximizing support vector elements separation, is the conditional function for clustering the malignant and the benign classes, and L_min(w,b) is the Langrangian L of and of variables and b. Here, is the weight vector of ℝ^ℒ dimension, b is the bias of ℝ¹ dimension, while is the SVM’s α-matrix where the α_i ≠ 0 ∀ i ∈ ℕ ≤ N_TR elements correspond to the SVM’s support vectors. To determine suboptimal values for , b, and α, stochastic gradient descent (SGD) was implemented by considering the gradients (5.4) (5.5) (5.6) To optimize the model, a grid search was performed from a series of learning rates ℓ from 1 to 5×10⁻⁵ over 1000 epochs. The validation set accuracy was considered as the optimization metric which determined the superiority of one model from the other. The process was repeated for 50 trials to ensure stability. The average validation accuracy of a model over the 50 trials determined the overall metric of the model for a considered learning rate, ℓ_i. The ℓ_i which constituted the highest overall validation accuracy was considered as optimal learning rate to train and test the SVM model. The output probability diagnosis of the model for benign and malignant cases, p_SVM(x), was computed using Platt’s method [41].

Logistic regression (LR).

The designed LR model is a ℒ-input classifier with an output probability p_LR(x) quantifying the likelihood of an input to be malignant. p_LR(x) is defined as (6.1) where w ∈ ℝ^ℒ and b ∈ ℝ¹ are the weights and bias characterizing the LR model. In training the model, SGD was used to minimize loss over the training set. The considered loss function was the binary cross-entropy loss function given by (6.2) where p(y_i) denotes the probability of obtaining a malignant diagnosis given a theoretical output of y_i; where y_i = 1 for malignant cases, and y_i = 0 for benign. To optimize the model, a grid search was performed with a design similar to that performed for the SVM.

Decision tree (DT) and random forest (RF).

The classification and regression trees (CART) algorithm was used to generate DTs of binary splits. The Gini’s diversity index was used to find the best input Φ_j | j ∈ ℕ ≤ ℒ for splitting the training set for each iteration of branching; where Φ_j is the j^th wavenumber from the fingerprint region 1800 cm^-1 to 850 cm^-1. Gini’s diversity index [42] is given by (7.1) where the maximum bound for the summation operation denotes the binary characteristic of the splitting considered. Furthermore, is the total number of class m separated by the value at node t, N_t is the total number of elements in at node t and P(m|t) is the probability of class m for being either malignant or benign from happening at node t. Since the elements in are continuous variables, the best value of separation was identified from by considering the element having the least GINI(j)_t metric [43]. The branching was recursively performed for each newly created node until the performance in the validation set accuracy decreased.

The designed RF utilized the creation of trees following the previously discussed. The diagnosis of the RF was determined as the prevailing diagnosis made by its constituent bags of DTs. To determine the optimum number of trees N_RF for the RF, a grid search from 3 to 100 trees was performed. The validation set accuracy was considered as the optimization criteria of the search. Each simulation was repeated over 50 times for each iteration n|n ∈ {3 ≤ ℕ ≤ 100} to ensure stability. The average accuracy over the 50 trials served as the final performance metric of the RF for an N_RF equal to n. The final RF constituted the design with the highest average accuracy.

Naïve bayes (NB).

The designed NB is a classifier of two classes of n|n ∈ ℕ ≤ ℒ inputs. For each j^th input, , the best value of separating the elements of between two sub-classes was determined. The algorithm for finding is the same as that of the DT and RF designs where the Gini’s index was used (Eq 7.1). The predictive value f_NB(x) of the NB is defined as the probability of a sample for being malignant, given an input . p_NB(x) is given by (8.1) where the numerator corresponds to the total probability of an input x from happening, given n-inputs considered, with an x(j) value classified as class m_j for a determined separation for malignant cases. On the other hand, the denominator is the total probability of the set of m_j from x for ever happening. The NB classifier outputs a malignant diagnosis when p_NB > 50%; otherwise, the diagnosis is benign. In order to determine the optimal n-value for the classifier, the number of inputs was increased from 3 to ℒ, where the inputs of the least GINI(j)_t value were considered first. The optimization was terminated at the n-value where the validation accuracy of the model started to decrease. Each iteration of n was repeated for 50 trials, where the average validation accuracy from the 50 trials was the considered optimization metric criterion. The final NB constituted to the design with the highest average validation accuracy.

Identification of dominant spectral components

In order to identify the most significant wavenumbers which influenced a sample’s diagnosis via the NN models, a novel sensitivity analysis was performed based on the optimized FNNs (n = 2, n = 4, n = 8). It must be noted that a visual peak analysis of the obtained spectral data was also performed prior to sensitivity analysis to compare the identified significant wavenumbers from the NN.

Visual peak analysis.

Significant peaks in the fingerprint region were identified through visual inspection of X_SD. Test of normal distribution using Shapiro-Wilk test and variance of homogeneity were performed for the identified peaks. Since all data followed a non-normal distribution, they were subjected to Mann-Whitney U test to assess if the absorbance peaks of malignant samples were significantly different (p-value <0.05) from the benign samples. Statistical analyses were performed using MATLAB 2020b.

Sensitivity analysis of neural network.

A modified neural network committee-based (NNC) sensitivity analysis was considered using the input perturbation algorithm. In order to simplify the NNC sensitivity analysis, the committee of NNs were designed and trained following the design architecture of the optimized FNNs [20–22].

For the analysis, an experimental set S_EXP ⊂ X_SD was used to train and analyse the FNN design. S_EXP comprises of 70% randomly-selected elements from X_SD. The elements of S_EXP was randomly selected from X_SD. Moreover, the quantity of malignant and benign samples from S_EXP were equally proportioned. For each selected input , a perturbation Δx_j from –50% to 50% of at 5% steps was added to the , and the mean square error (MSE) of the output was tabulated; where is the mean value of S_EXP for the j^th input [21]. The MSE for the j^th input variable at its k^th step perturbation Δx_j,k (MSE_j,k) is calculated using the formula (9.1) where N_EXP is the total number of elements in S_EXP, O_j,k is the output vector of the model for Δx_j,k, and is the ideal output vector; where for a benign sample, and for a malignant one. The overall response of the network MSE_j for the j^th input was computed by averaging MSE_j,k for all Δx_j,k k-steps given by (9.2) To ensure stability of the performed sensitivity analysis, the process was repeated for a committee of 50 NN (i.e., 50 trials). The overall response of the j^th input, was computed as the average of the j^th input response over the 50 trials .

To visualize the perturbation response of the considered FNN, was plotted for all j ∈ ℕ ≤ ℒ. The sensitivity analysis was performed for each FNN (n = 2, n = 4, n = 8).

Motivation and theory.

The diagnostic ability of often-used machine learning models such as SVM, LDA, and PCA greatly relies on the variation among and between data within a data set X. This variation is often quantified using a covariance matrix, . By obtaining the eigenvectors associated with S, the characteristic form of the data may be easily represented. However, this approach becomes ineffective if the variation among the data set elements X_j ∈ X becomes significantly small (but not infinitesimally small to imply repetitive data). In such a scenario: S → 0, the obtained eigenvector solutions, , become trivial where . This makes it difficult to distinguish important variables which may prove significant in determining an accurate diagnosis. For artificial NNs, the weights determine the correlation of the input parameters toward the outputs under 0-bias condition. Here, the magnitude of a weight determines the magnitude of the correlation, while the sign of the weight determines the direction of influence [44]. This simplistic model does not however explain the contribution of biases and activation functions in a network’s decision-making process.

The proposed model of this study probes the significance of an input parameter based on the magnitude of change at the output for given ranges of input perturbations Δx_j,k. The magnitude of influence of an input variable is given by the MSE_j,k, which is a magnitude value at the range of 0 to 1 since the output are probabilities. In obtaining MSE_j,k, and were assumed as a single set since analyzing them separately would not provide an overall determination of the significant variables considering both classification. Furthermore, since the spectral data set X_SD was normalized for each set element , MSE_j,k was expected to be less varied between samples for all across all input variables. This assumption makes it justifiable to denote the overall average, MSE_j, as the overall measure of an input response for a single neural network. The overall MSE magnitude, , was assumed as the final metric to measure the influence of a given variable, which is the average MSE_j measure considering multiple similar NNs. In such process, it was assumed that each NN had similar input responses since each followed the same architecture, training, and optimization. Lastly, in determining the most significant input variables, was no longer ranked in contrast to usual sensitivity analyses [20–22, 45, 46]. Since the functional groups and vibrational modes associated with the input variables are usually presented in ranges within the IR spectrum, it was more appropriate to identify and discuss significant peaks from the plot of rather than in a ranked form.

Overall, this study proposed that input variables associated with comparatively high MSE constitute to significant wavenumbers important in the NN’s diagnosis. Since the NN models were assumed as the prevailing models and are highly accurate, the determined wavenumbers may thus provide insights in the associated changes in chemical composition and structure of biomolecules in cancerous breast tissues. The overall method implemented in the study is summarized in Fig 1.

Download:

Fig 1. Experimental design process flowchart.

The figure shows the experimental design implemented in the study, from the acquisition of breast tissue samples, to the acquisition, processing and analysis of spectral data.

https://doi.org/10.1371/journal.pone.0262489.g001

Results

Samples

The clinical characteristics of the samples were retrieved from medical records and histopathology reports of the hospital study sites (Table 1). Among the malignant samples, majority were invasive ductal carcinoma. Meanwhile, the benign breast samples were mainly fibroadenoma and fibrocystic change (Table 1). The above classifications were based on microscopic examination of H&E-stained specimens and immunohistochemical staining (if needed or available) following the current WHO classification.

Download:

Table 1. Clinical data of the patients with breast lesions^*.

https://doi.org/10.1371/journal.pone.0262489.t001

The variation between benign and malignant samples is shown in Fig 2. From the PCA plot, 90.28% of the variability was associated with the first principal component F₁, while only 5.12% was associated with the second principal component F₂. Most of the benign samples were scattered along the negative domain of the F₁ axis while malignant samples were evenly scattered. Both sample classes followed a parabolic distribution across the determined principal axes. Overall, the PCA biplot suggests that the benign and malignant breast samples were highly similar in characteristics.

Download:

Fig 2. PCA biplot showing data points of malignant and benign samples.

The red points denote malignant samples while blue points denote benign samples plotted across the two most dominant components (F₁ = 90.28% and F₂ = 5.21%). The vectors show the wavenumbers associated with peak absorbance, where those highlighted in green were identified as significant wavenumbers in discriminating benign from malignant samples.

https://doi.org/10.1371/journal.pone.0262489.g002

Feedforward neural network designs

The NN input layer consisted of 462 nodes which corresponded to the defined IR absorbance of each sample in frequencies between 1800 cm^-1 to 850 cm^-1. Three (3) FNN models were designed with varying layer sizes (n = 2, n = 4, n = 8). The quantity of neurons per FNN hidden layer was kept constant for each repetition. The general NN architectures are summarized in Table 2.

Download:

Table 2. Feedforward neural network architecture.

https://doi.org/10.1371/journal.pone.0262489.t002

Gaussian random initialization was assumed for weight initialization, while a zero-value initialization was used for the biases. Moreover, SELU activation was used for all neuron activation functions except for the respective output layers in which the softmax activation function was used. All NN were trained through backpropagation via AdaGrad stochastic gradient decent (SGD) [47] over 1000 epochs. The binary cross-entropy function was considered as the cost function for all NN designs during the training process. To avoid over-fitting, a dropout of 90% was used for each feed forward hidden layers as recommended for SELU activation [29, 37].

Neural network optimization

Pre-training, a grid search was performed to determine the optimal learning rate and the layer width. To limit the search space of the performed grid search, the explored learning rates L, and the considered layer width N_W, were limited to 10 and 20 elements, respectively, where and N_W = {10, 15, 20, 25, 30, 40, 50, 60, 80, 100, 120, 140, 160, 180, 200, 250, 300, 350, 400, 462}. For every combination of hyperparameters, each NN model was trained over 1000 epochs using the training set via a 10-fold cross-validation procedure over 50 trials. The best hyperparameter combination for each model was identified as the hyperparameter combination with the highest average validation accuracy.

The result of the grid search for each NN design (Fig 3) shows that for all models, the designs peaked at a learning rate of about 0.005 to 0.1, with the FNN2 having the most stable deviation of performance across its range and the FNN8 as the most unstable. Regardless of the model, it can further be seen that at learning rates above 0.1, the models exhibited very poor performance in validation accuracy which may be due to divergence and large weight oscillations. Meanwhile, the validation accuracy stagnated at learning rates of below 0.005, which may be due to very small changes in the NN’s weights and biases. The optimized learning rate for each model was identified to be all equal to 0.01, while the number of neurons per layer of each model to achieve best performance were 350, 400, and 300 for FNN2, FNN4, and FNN8, respectively.

Download:

Fig 3.

A. Grid search surface plot of FNN2. The plot shows the grid search surface plot for optimizing the FNN2 model for learning rate and number of neurons per hidden layer. Validation accuracy peaked at a learning rate of 0.01 and 350 neurons per hidden layer. Low performance at high learning rates (blue region) may be due to divergence and high parameter oscillations, while stagnation of performance at low learning rates (green region) may be due to insufficient training time. B. Grid search surface plot of FNN4. The plot shows the grid search surface plot for optimizing the FNN4 model for learning rate and number of neurons per hidden layer. Validation accuracy peaked at a learning rate of 0.01 and 400 neurons per hidden layer. The same behavior in the low and high learning rate regions, observed in the FNN2 surface plot, is also evident here. C. Grid search surface plot of FNN8. The plot shows the grid search surface plot for optimizing the FNN8 model for learning rate and number of neurons per hidden layer. Validation accuracy peaked at a learning rate of 0.01 and 300 neurons per hidden layer. Among the FNN surface plots, the FNN8 constituted to the most unstable response in validation accuracy.

https://doi.org/10.1371/journal.pone.0262489.g003

Diagnostic performance of models

Diagnostic performance of the NN models and the other machine learning models (NB, DT, RF, LR, LDA, SVM) was determined by comparison with the gold standard, which is the microscopic examination of H&E-stained tissues by pathologists. In terms of ACC, NPV and RR, all the NN models were able to significantly surpass all the other machine learning models (Tables 3 and 4). However, none of the models was able to outperform the SVM model as to AUC (95.72% ± 4.94%). Among the NN models, the best AUC was achieved by FNN4 (90.48% ± 10.30%) followed by FNN8 (90.35% ± 10.10%) then FNN2 (90.05% ± 9.95%). The relatively low AUC among NN models may be attributed to a lack of training time. In terms of PPV, the NN model performances did not significantly differ from the best benchmark model, which was SVM. As to NPV, the NN models were able to surpass the performance of the best benchmark model by an average of 4.42% for FNN2, 3.16% for FNN4 and 2.08% for FNN8. As to SR, the models performed significantly less than SVM, but were able to outperform the other benchmark models. The models were able to outperform SVM in terms of RR by an average of 2.94% for FNN2, 1.23% for FNN4 and 0.17% for FNN8. Note, however, that the RR value for FNN8 was not significantly differed from that of the SVM. The observed increase in metric performance from FNN8 to FNN2 may be due to a lack in training time for the deeper models. Overall, the significantly high ACC value of the FNN models may prove them viable classifiers in distinguishing malignant from benign samples using FTIR data. The results are summarized in Tables 3 and 4.

Download:

Table 3. Diagnostic performance of the models.

https://doi.org/10.1371/journal.pone.0262489.t003

Download:

Table 4. Test of significance of neural network performance metrics relative to SVM.

https://doi.org/10.1371/journal.pone.0262489.t004

Visual peak analysis of data

The fingerprint spectral region (1800 cm^-1 to 850 cm^-1) showed that the absorbance spectra of the malignant and benign tissues were significantly distinct from each other, specifically at bands 1452cm^-1/1452 cm^-1, 1399 cm^-1/1401 cm^-1, 1337 cm^-1/1337 cm^-1, 1279 cm^-1/1279 cm^-1, 1236 cm^-1/1236 cm^-1, which represent lipids, DNA, RNA and phospholipids (Table 2). Other peaks tested, in particular 1632cm^-1/1634 cm^-1, 1539cm^-1/1540 cm^-1, 1160cm^-1/1160 cm^-1, 1032cm^-1/1030 cm^-1, and 880cm^-1/878 cm^-1, were found as non-distinct among samples. These bands represent amide I proteins, amide II proteins, carbohydrates, glycogen, and phosphorylated proteins. It is worth mentioning that the tissues analyzed had uniform thickness (5μm) achieved by sectioning with a microtome. The median absorbance spectra of benign and malignant breast tissue samples are shown in Fig 4.

Download:

Fig 4. Median ATR-FTIR absorbance spectra of malignant (n = 88) and benign (n = 78) breast tissue samples.

The figure shows the median FTIR spectrum of malignant and benign breast tissue samples and their corresponding peaks identified via visual analysis. The plot shows almost similar absorbance among benign and malignant samples within wavenumbers associated with the amide proteins. Benign tissue samples, relative to malignant tissue samples, are shown to have increased absorbance within the region associated with lipids and nucleic acids while having decreased absorbance within the region associated with carbohydrates, glycogen, and phosphorylated proteins.

https://doi.org/10.1371/journal.pone.0262489.g004

Test of homogeneity showed that the characteristic IR absorbance peaks of the malignant cases were significantly different (p<0.05) from the benign samples. The visually identified peak positions and absorbances in the fingerprint IR region (1800 cm^-1 to 850 cm^-1) that could significantly differentiate the malignant from benign samples are summarized in Table 5. Their corresponding functional group, vibrational mode, and molecular source assignments are also listed in the aforementioned table. It was observed that peak absorbances representing lipids, DNA, RNA and phospholipids were significantly decreased in most malignant tissues.

Download:

Table 5. Comparison of the spectrum variables (peak positions and normalized absorbances) of malignant and benign breast samples in the fingerprint IR region (1800cm^-1 to 850cm^-1) via visual peak identification.

https://doi.org/10.1371/journal.pone.0262489.t005

The results of the performed test of significance is further backed-up in Fig 2, where wavenumbers associated with significant peak absorbances were more closely projected to F₁ than those which were not. This implies that lipids, DNA, RNA and phospholipids are responsible for the high variability among the samples, suggesting further that these biomolecules highly varies between malignant and benign classes.

Significant peaks identified by neural networks

The input response produced by each NN design is summarized in Fig 5. As shown, the response of each network is highly similar to one another. For all NN designs, the IR region associated with the functional groups, in particular within ~960 cm^-1 to ~1050 cm^-1, showed the greatest input response within the considered spectrum. Meanwhile, the least response was evident within ~1250 cm^-1 to ~1320 cm^-1.

Download:

Fig 5. Input response of neural networks from the designed NN-based sensitivity analysis.

The line plots show the input response of each neural network design per change of absorbance value per wavenumbers. A high per cent contribution magnitude implies a high response to a change for the particular wavenumber, hence may serve as a marker in identifying malignant samples from benign samples. As evident from the figure, the response of each network is nearly the same, which only varies slightly in the magnitude contribution.

https://doi.org/10.1371/journal.pone.0262489.g005

Using the input response of NN from neural network committee-based sensitivity analysis, the significant peaks were identified from the fingerprint spectral region (1800 cm^-1 to 850 cm^-1) as shown in Fig 5. The information on the protein content, including its secondary structures such as amide I and amide II, are observed in the region between 1800 cm^-1 to 1500 cm^-1 [48, 49]. The bands found at ~1635 cm^-1 are associated with the amide I protein that arises from the C = O stretching vibrations of the amide groups of the protein backbone [49, 50]. The bands at 1540 cm^-1 results to N-H bending in the amide II groups, which is associated with aromatic amino acids. The bands 1454 cm^-1 and 1393 cm^-1 are associated with the CH_2, CH₃ deformation modes mainly from proteins and lipids [49, 51]. Peak position 1393 cm^-1 resulted from COO⁻symmetric stretching of amino acids [49]. The region of 1300 cm^-1–800 cm^-1 corresponds to the variations of functional groups that are present in proteins, nucleic acids, carbohydrates and phospholipids such as PO₂–, C–O, and C–C [48, 52]. The band at 1238 cm^-1 is at the range of sugar-phosphate chain vibrations which is related to PO₂⁻ asymmetric stretching of nucleic acids [52]. Furthermore, this is also the range for Amide III (1299 cm^-1–1200 cm^-1) for its C-N stretching, N-H bending, C = O stretching, and O = C-N bending [53, 54]. The vibrations of C–O group from glycogen are observed in peak 1077 cm^-1 [51]. The band 1030 cm^-1–1045 cm^-1 is due to C–O stretching and C–O bending of the C–OH groups of carbohydrates such as glucose, fructose [50, 54]. The band at 990 cm^-1 is related to phosphorylation of proteins and ribose-phosphate chain [50] while ~962 cm^-1 is associated with symmetric phosphate stretching modes from phosphate diester groups in nucleic acids and phospholipids [51].

Discussion

FTIR spectroscopy is a prospective novel diagnostic tool that is used to distinguish cancer samples from normal ones at high sensitivity, specificity, and accuracy [55, 56]. Considering the molecular complexity of biological specimens, chemometric techniques such as the principal component analysis (PCA) and artificial neural networks (ANN) that combine statistical and mathematical algorithms are utilized to generate chemo-physical evidence from spectral data [55].

The advent of computers with enhanced processing capabilities and enhanced memory capacity have led in the rise of computer-aided diagnosis (CAD), which combines algorithms or methods from pattern recognition and digital image processing [56]. Meanwhile, scientists have been drawn to the potential application of FTIR spectroscopy in the clinical setting to improve accuracy and reproducibility of cancer diagnosis, while omitting the need for complex and time-consuming clinical processing of tissue biopsy samples [55]. This is best exemplified by the study of Großerueschkamp, et al., wherein they combined FTIR imaging and a novel trained random forest (RF) classifier for the automated marker-free histopathological annotation of lung tumor classes and subtypes of adenocarcinoma without further treatment of tissue samples. This study yielded greater reproducibility and accuracy of 97% for the annotation of lung tumor classes and 95% for the identification of prognostic adenocarcinoma subtypes [57]. Subsequently, FTIR reduced intra- and inter-operator variability through its objectivity, reproducibility, and improved accuracy over current methodologies for cancer diagnosis [55]. This also permits the standardization of spectral measurement and analysis, which is necessary for the construction of FTIR spectral databases with highly specific spectroscopic markers for the various stages and grades of different cancer types applicable to the clinical settings [58]. Additionally, an easy and objective data interpretation can be done by non-spectroscopists by incorporating powerful algorithms for automatic data analysis of large data sets [55].

The designed NNs exhibited superior accuracy (˃90%) relative to the best benchmark model (SVM). These metrics prove them not only as excellent classifiers in distinguishing malignant from benign breast cancers using ATR-FTIR data, but also excellent classifiers in general. Overall, the FNN2 model was able to obtain larger metric values relative to the other NN models. The decrease in the performance metrics, in particular, accuracy, NPV, and RR of the models as a function of the layer quantity makes it evident that the deeper models may have lacked training time. Do note, however, that an opposite trend is evident for the SR and PPV metrics, implying that the designed architectures may approach a classifier that becomes increasingly more accurate in detecting malignant rather than benign samples as the model gets deeper. This observation presents a trade-off between the model’s capability in confirming truly malignant from benign samples. This, further, implies that if a more accurate positive screening test is more in-demand, then deeper models may be assumed. Conversely, for more accurate negative screening tests, a less deep model may be more necessary.

Considering the metric comparison performed, the non-significant PPV metric of the designed NNs make them equally competent to SVM. However, the significantly higher NPV metric of the designed NN models make them more superior classifiers in terms of identifying benign samples as truly exhibiting non-malignancy. The significance of the SR and the RR are parallel to the significance of the PPV and NPV by definition, respectively. This characteristic makes the designed NN models more practical to use in situations where diagnosing non-malignant patients as malignant becomes very costly. While administering an incorrect diagnosis is very detrimental for a patient in general, for financially non-capable individuals, an accurate diagnosis of non-malignancy may be of more importance since a false diagnosis of malignancy risks the individual of financial burden in chemotherapy, and a probable decline in health which further necessitates added costs. In developing countries such as the Philippines, the use of highly specific diagnostic tool such as the designed NNs in this study may prove more beneficial for patients undergoing cancer diagnosis. Regardless of the use, however, the models show their potential as highly specific tools to assist pathologists and medical practitioners in the field.

The designed neural networks that were used to analyze the FTIR spectra were able to identify significantly decreased peak absorbances characteristic of lipids, nucleic acids, and phospholipids in malignant tissues, which were similarly evident in the performed visual analysis (Table 5). Breast cancer is often characterized by the stimulated production of novo lipids which are essential for cell growth, proliferation, and oxidative stress resistance. The triacylglyceride storage in lipid droplets has been suggested to work as fuel source after re-oxygenation during intermittent hypoxia, whereas fatty acids promote redox balance supporting a high-glycolytic rate in malignant tissues. Lipids also form the structural basis of paracrine hormones and growth factors which stimulate tumor growth, neovascularization, invasion, and metastatic spread [59]. The decrease in lipids and phospholipids may reflect the utilization of these biomolecules for nutrition and energy source; and thus, prevent there accumulation in cancer cells during cancer progression [60]. A significant difference in the absorptive peak was also apparent in the DNA/RNA spectral region, with the malignant breast samples showing significantly lower peak absorbance than benign samples. This is in contrary to the findings of Lazaro-Pacheco et al. wherein higher contribution of nucleic acid bands was identified in cancerous samples in comparison with normal breast tissues in different spectral regions. They argued that high concentration of these biomolecules is expected since there is increased cellular content in response to an abnormal proliferation [61]. However, in a study involving ovarian cancer, the RNA/DNA absorption peaks were significantly lower in malignant tissues than in borderline and benign ovarian tissues [62]. The lowered peak absorbance among malignant samples may be due to fragment transfer of tumor DNA or cell-free RNA from the cancerous area to the bloodstream; thus, consequently decreasing the nucleic acid content at the primary tumor area [63].

Interestingly, there was no distinctive difference between malignant and benign breast tissues in the absorptive peaks of carbohydrates as perceived by the visual analysis performed (Table 5). In relation, the P1160 wavenumber vector that is associated with carbohydrates was projected significantly far from F₁, implying further that this biomolecule is not a possible cause of variability (Fig 2). However, the sensitivity analysis stated otherwise since the highest peak was evident across the IR region associated with the C–OH group of carbohydrates (Fig 5). Studies have shown that assessing glycogen levels is a good differentiation marker between malignant and benign tissues, with malignant samples generally consuming more glycogen to sustain survival during prolonged hypoxia and glucose deprivation as well as to sustain metastasis [64]. This ability of the sensitivity analysis to recognize the carbohydrates as differentiating factor, which in contrast was not detected by mere visual peak analysis, further proves the proposed method as a more discerning method to identify important spectral biomarkers. Possibly, the breast cancer cells could have already catabolized their glycogen stores as well as their subsequent by-products such as glucose for survival in nutrient-deprived environment [23]; hence, became relatively indistinguishable by the usual visual peak analysis. Given the proposed method’s superior ability, the study suggests that the identified peaks within the higher IR wavelength region (~1400 cm^-1 to ~1800 cm^-1) be given attention, particularly those associated with CH_2, CH₃ deformation modes and amide protein stretch and bends.

A variety of biological materials including blood, solid tissues, urine, and sputum have been studied using FTIR spectroscopy to develop better alternatives for cancer diagnosis and management. In the clinical setting, blood and tissue samples remain to be widely used as opposed to other specimens for diagnosing disease [55]. In less developed countries, the use of FFPE samples can provide technical ease and economic advantage for longitudinal tissue specimen storage as they can be easily retrievable from accredited repositories for further analysis [65]. Compared to immunohistochemical and molecular assays, FTIR can be a cheaper alternative for detecting biochemical markers in pathologic FFPE specimens based on unique vibrational patterns [60]. With the introduction of machine learning, FTIR spectroscopy in clinical diagnostic settings can reduce intra- and inter-operator variability and improve accuracy and reproducibility of cancer diagnosis, while omitting the need for complex and time-consuming clinical processing of clinical samples [56].

The generation of NN models from the FTIR fingerprint of benign and malignant FFPE breast tissues led to the identification of significant wavenumbers apart from those at peak absorbances, which can be used to discriminate malignant from benign tissues. Interestingly, unique peak absorbances distinctive of lipids, nucleic acids, and phospholipids were identified, showing that these biomolecules were significantly decreased in malignant tissues as compared to benign samples, and can, therefore, be used as biochemical fingerprints to aid in cancer diagnosis.

While the current study shows that NN models from FTIR spectra can be used as an adjunct tool for diagnosing breast cancer, additional clinical studies should be made to bring this technology into the clinical setting. Due to financial constraint, this study was conducted using only the basic type of FTIR spectrometer with limited spatial resolution. To acquire a comprehensive spectral data, additional FFPE samples may be analyzed using an FTIR coupled with an infrared microscope to detect vibrational motions of molecules within very restricted regions. Other spectroscopic techniques such as the Raman spectroscopy can also be used to further probe molecular vibrations to aid in the characterization and discrimination of tissue types [66]. The creation of spectral database and the generation of novel powerful algorithms for automatic data analysis of large data sets is another prospect to accelerate point-of-care decisions and improve therapeutic management for breast cancer patients [67]. Further studies also show that an alternative sample to tissues could be blood plasma, since the use of plasma is cheaper, less invasive, and easier to process [68, 69]. Through the integration of AI and FTIR, spectral biomarkers in plasma samples may be identified to monitor treatment response; a study which is already being investigated by the research team.

In summary, the present study generated NN models that led to the identification of unique infrared spectrum of absorption in the lipid, nucleic acid, phospholipid, and carbohydrates regions that could effectively discriminate malignant from benign breast tissues. To the researchers’ knowledge, this is the first study to have used several machine learning tools to identify malignant breast tissues based on FTIR spectral data.

Acknowledgments

We thank Mr. Patrick Jun Paul Lawan and Ms. Ericka Hidalgo for their technical assistance, and Ralph Christian Tomas for his scientific input.

References

1. Mainiero MB, Lourenco A, Mahoney MC, Newell MS, Bailey L, Barke LD, et al. ACR Appropriateness Criteria Breast Cancer Screening. J Am Coll Radiol [Internet]. 2013;10:11–4. Available from: https://doi.org/10.1016/j.jacr.2016.09.021 pmid:23290667
- View Article
- PubMed/NCBI
- Google Scholar
2. Brem RF, Lenihan MJ, Lieberman J, Torrente J. American women having dense breast tissue. AJR Am J Roentgenol. 2015;204(2):234–40. pmid:25615743
- View Article
- PubMed/NCBI
- Google Scholar
3. Mathenge EG, Dean CA, Clements D, Vaghar-kashani A, Giacomantonio M, Malueth B, et al. Core Needle Biopsy of Breast Cancer Tumors Increases Distant Metastases in a Mouse Model. Neoplasia [Internet]. 2014;16(11):950–60. Available from: https://doi.org/10.1016/j.neo.2014.09.004 pmid:25425969
- View Article
- PubMed/NCBI
- Google Scholar
4. Kabel AM. Tumor markers of breast cancer: New prospectives. J Oncol Sci [Internet]. 2017;3(1):5–11. Available from: https://doi.org/10.1016/j.jons.2017.01.001
- View Article
- Google Scholar
5. Bunaciu AA, Aboul-Enein HY, Fleschin S. FTIR Spectrophotometric Methods Used for Antioxidant Activity Assay in Medicinal Plants. Appl Spectrosc Rev [Internet]. 2012;47(4):245–55. Available from: http://www.tandfonline.com/doi/abs/10.1080/05704928.2011.645260
- View Article
- Google Scholar
6. Yang D, Castro DJ, el-Sayed IH, el-Sayed MA, Saxton RE, Zhang NY. A Fourier-transform infrared spectroscopic comparison of cultured human fibroblast and fibrosarcoma cells: a new method for detection of malignancies. J Clin Laser Med Surg [Internet]. 1995;13(2):55–9. Available from: http://www.ncbi.nlm.nih.gov/pubmed/10150573 pmid:10150573
- View Article
- PubMed/NCBI
- Google Scholar
7. Siqueira LFS, Lima KMG. A decade (2004–2014) of FTIR prostate cancer spectroscopy studies: An overview of recent advancements. Trends Anal Chem [Internet]. 2016;82:208–21. Available from: https://doi.org/10.1016/j.trac.2016.05.028
- View Article
- Google Scholar
8. Bhalerao RY, Jani HP, Gaitonde RK, Raut V. A novel approach for detection of Lung Cancer using Digital Image Processing and Convolution Neural Networks. 2019 5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019. 2019;577–83.
9. Rossetto AM, Zhou W. Deep Learning for Categorization of Lung Cancer CT Images. Proceedings—2017 IEEE 2nd International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2017. 2017;272–3.
10. Kido S, Hirano Y, Hashimoto N. Detection and classification of lung abnormalities by use of convolutional neural network (CNN) and regions with CNN features (R-CNN). 2018 International Workshop on Advanced Image Technology, IWAIT 2018. 2018;1–4.
11. Santillan A, Tomas RC, Bangaoil R, Lopez R, Gomez MH, Fellizar A, et al. Discrimination of malignant from benign thyroid lesions through neural networks using FTIR signals obtained from tissues. Anal Bioanal Chem. 2021;413(8):2163–80. pmid:33569645
- View Article
- PubMed/NCBI
- Google Scholar
12. Kaur B, Mann KS, Grewal MK. Ovarian cancer stage based detection on convolutional neural network. Proceedings of the 2nd International Conference on Communication and Electronics Systems, ICCES 2017. 2018;2018-Janua(Icces):855–9.
13. Rahman MA, Muniyandi RC, Islam KT, Rahman MM. Ovarian Cancer Classification Accuracy Analysis Using 15-Neuron Artificial Neural Networks Model. 2019 IEEE Student Conference on Research and Development, SCOReD 2019. 2019;33–8.
14. Zou L, Yu S, Meng T, Zhang Z, Liang X, Xie Y. A Technical Review of Convolutional Neural Network-Based Mammographic Breast Cancer Diagnosis. Computational and Mathematical Methods in Medicine. 2019;2019(Dm). pmid:31019547
- View Article
- PubMed/NCBI
- Google Scholar
15. Zuluaga-Gomez J, Masry Z Al, Benaggoune K, Meraghni S, Zerhouni N. A CNN-based methodology for breast cancer diagnosis using thermal images. 2019;0–2.
16. Ragab DA, Sharkas M, Marshall S, Ren J. Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ. 2019;2019(1):1–23. pmid:30713814
- View Article
- PubMed/NCBI
- Google Scholar
17. Mishra A, Cheng H. Advanced CNN Architectures. 2017.
18. Vert JP. Artificial intelligence and cancer genomics. Healthcare and Artificial Intelligence. 2020;1(February):165–74.
- View Article
- Google Scholar
19. Barth A. Infrared spectroscopy of proteins. Biochim Biophys Acta—Bioenerg. 2007;1767(9):1073–101. pmid:17692815
- View Article
- PubMed/NCBI
- Google Scholar
20. Zeng X, Yeung DS. A quantified sensitivity measure for multilayer perceptron to input perturbation. Neural Computation. 2003;15(1):183–212. pmid:12590825
- View Article
- PubMed/NCBI
- Google Scholar
21. Cao M, Qiao P. Neural network committee-based sensitivity analysis strategy for geotechnical engineering problems. Neural Computing and Applications. 2008;17(5–6):509–19.
- View Article
- Google Scholar
22. Sysoev A, Ciurlia A, Sheglevatych R, Blyumin S. Sensitivity analysis of neural network models: Applying methods of analysis of finite fluctuations. Periodica polytechnica Electrical engineering and computer science. 2019;63(4):306–11.
- View Article
- Google Scholar
23. Bangaoil R, Santillan A, Angeles LM, Abanilla L, Lim A, Ramos MC, et al. ATR-FTIR spectroscopy as adjunct method to the microscopic examination of hematoxylin and eosin-stained tissues in diagnosing lung cancer. PLoS One [Internet]. 2020;15(5):e0233626. Available from: https://doi.org/10.1371/journal.pone.0233626 pmid:32469931
- View Article
- PubMed/NCBI
- Google Scholar
24. Podshyvalov A, Sahu RK, Mark S, Kantarovich K, Guterman H, Goldstein J, et al. Distinction of cervical cancer biopsies by use of infrared microspectroscopy and probabilistic neural networks. Appl Opt [Internet]. 2005;44(18):3725. Available from: https://www.osapublishing.org/abstract.cfm?URI=ao-44-18-3725 pmid:15989047
- View Article
- PubMed/NCBI
- Google Scholar
25. Salman A, Shufan E, Sahu RK, Mordechai S, Sebbag G. Insights on colorectal cancer relapse by infrared microscopy from anastomosis tissues: Further analysis. Vib Spectrosc. 2016;83:17–25.
- View Article
- Google Scholar
26. Zhang X, Xu Y, Zhang Y, Wang L, Hou C, Zhou X, et al. Intraoperative detection of thyroid carcinoma by fourier transform infrared spectrometry. J of Surg Res. 2011;171(2):650–6. pmid:20828740
- View Article
- PubMed/NCBI
- Google Scholar
27. Wu M, Zhang W, Tian P, Ling X, Xu Z. Intraoperative diagnosis of thyroid diseases by fourier transform infrared spectroscopy based on support vector machine. Int J Clin Exp Med. 2016;9(2):2351–8.
- View Article
- Google Scholar
28. Bhosale JS. High signal-to-noise Fourier transform spectroscopy with light emitting diode sources. Rev Sci Instrum. 2011;82(9). pmid:21974569
- View Article
- PubMed/NCBI
- Google Scholar
29. Klambauer G, Unterthiner T, Mayr A, Hochreiter S. Self-normalizing neural networks. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). 2017.
30. Costa F, Marques A, Arnaud-Fassetta G, Alonso J, Martins I, Guerra C. Self-Normalizing Neural Networks Günter. European Continental Hydrosystems under Changing Water Policy. 2013;(Nips):99–112.
- View Article
- Google Scholar
31. Bogomolny E, Argov S, Mordechai S, Huleihel M. Monitoring of viral cancer progression using FTIR microscopy: A comparative study of intact cells and tissues. Biochim Biophys Acta. 2008;1780(9):1038–46. pmid:18588944
- View Article
- PubMed/NCBI
- Google Scholar
32. Ghimire H, Venkataramani M, Bian Z, Liu Y, Perera AGU. ATR-FTIR spectral discrimination between normal and tumorous mouse models of lymphoma and melanoma from serum samples. Sci Rep. 2017;7(1). pmid:29209060
- View Article
- PubMed/NCBI
- Google Scholar
33. Liu H, Su Q, Sheng D, Zheng W, Wang X. Comparison of red blood cells from gastric cancer patients and healthy persons using FTIR spectroscopy. J Mol Struct. 2017;1130:33–7.
- View Article
- Google Scholar
34. Wang X, Shen X, Sheng D, Chen X, Liu X. FTIR spectroscopic comparison of serum from lung cancer patients and healthy persons. Spectrochim Acta—Part A Mol Biomol Spectrosc [Internet]. 2014;122:193–7. Available from: https://doi.org/10.1016/j.saa.2013.11.049 pmid:24316532
- View Article
- PubMed/NCBI
- Google Scholar
35. Lewis PD, Lewis KE, Ghosal R, Bayliss S, Lloyd AJ, Wills J, et al. Evaluation of FTIR Spectroscopy as a diagnostic tool for lung cancer using sputum. BMC Cancer [Internet]. 2010;10(1):640. Available from: http://bmccancer.biomedcentral.com/articles/10.1186/1471-2407-10-640 pmid:21092279
- View Article
- PubMed/NCBI
- Google Scholar
36. Philipp M, Rusch T, Hornik K, Strobl C. Measuring the Stability of Results From Supervised Statistical Learning. Journal of Computational and Graphical Statistics. 2018;27(4):685–700.
- View Article
- Google Scholar
37. Costa F, Marques A, Arnaud-Fassetta G, Alonso J, Martins I, Guerra C. Self-Normalizing Neural Networks Günter. In: 31st Conf Neural Inf Process Syst (NIPS). 2017. p. 99–112.
38. Gu Q, Li Z, Han J. Linear discriminant dimensionality reduction. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2011;6911 LNAI(PART 1):549–64.
- View Article
- Google Scholar
39. Cristianini N, Schölkopf B. Support vector machines and kernel methods: The new generation of learning machines. AI Magazine. 2002;23(3):31–41.
- View Article
- Google Scholar
40. Murty MN, Raghava R. Linear support vector machines. SpringerBriefs in Computer Science. 2016;(9783319410623):41–56.
- View Article
- Google Scholar
41. Platt JC. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. 2000;(June):1–11.
- View Article
- Google Scholar
42. Gehrke J. Classification and Regression Trees. Encyclopedia of Data Warehousing and Mining. 2011;246–80.
- View Article
- Google Scholar
43. Reza M, Miri S, Javidan R. A Hybrid Data Mining Approach for Intrusion Detection on Imbalanced NSL-KDD Dataset. International Journal of Advanced Computer Science and Applications. 2016;7(6):1–33.
- View Article
- Google Scholar
44. Guha R, Stanton DT, Jurs PC. Interpreting computational neural network quantitative structure-activity relationship models: A detailed interpretation of the weights and biases. Journal of Chemical Information and Modeling. 2005;45(4):1109–21. pmid:16045306
- View Article
- PubMed/NCBI
- Google Scholar
45. Montavon G, Samek W, Müller KR. Methods for interpreting and understanding deep neural networks. Digital Signal Processing: A Review Journal. 2018;73:1–15.
- View Article
- Google Scholar
46. Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, Goyal H. Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Annals of Translational Medicine. 2018;6(11):216–216. pmid:30023379
- View Article
- PubMed/NCBI
- Google Scholar
47. Duchi J, Hazan E, Singer Y. Randomized smoothing for (parallel) stochastic optimization. J Mach Learn Res. 2011;12:2121–59.
- View Article
- Google Scholar
48. Sitnikova VE, Kotkova MA, Nosenko TN, Kotkova TN, Martynova DM, Uspenskaya M V. Breast cancer detection by ATR-FTIR spectroscopy of blood serum and multivariate data-analysis. Talanta. 2020;214(October 2019):120857. pmid:32278436
- View Article
- PubMed/NCBI
- Google Scholar
49. Kar S, Katti DR, Katti KS. Fourier transform infrared spectroscopy based spectral biomarkers of metastasized breast cancer progression. Spectrochim Acta Part A Mol Biomol Spectrosc. 2019;208:85–96. pmid:30292907
- View Article
- PubMed/NCBI
- Google Scholar
50. Blat A, Wiercigroch E, Smeda M, Wislocka A, Chlopicki S, Malek K. Fourier transform infrared spectroscopic signature of blood plasma in the progression of breast cancer with simultaneous metastasis to lungs. Journal of Biophotonics. 2019;12(10):1–11. pmid:31265171
- View Article
- PubMed/NCBI
- Google Scholar
51. Depciuch J, Stanek-Widera A, Skrzypiec D, Lange D, Biskup-Frużyńska M, Kiper K, et al. Spectroscopic identification of benign (follicular adenoma) and cancerous lesions (follicular thyroid carcinoma) in thyroid tissues. J Pharm Biomed Anal. 2019;170:321–6. pmid:30954022
- View Article
- PubMed/NCBI
- Google Scholar
52. Zelig U, Barlev E, Bar O, Gross I, Flomen F, Mordechai S, et al. Early detection of breast cancer using total biochemical analysis of peripheral blood components: a preliminary study. BMC Cancer [Internet]. 2015;15(1):408. Available from: http://bmccancer.biomedcentral.com/articles/10.1186/s12885-015-1414-7 pmid:25975566
- View Article
- PubMed/NCBI
- Google Scholar
53. Elshemey WM, Ismail AM, Elbialy NS. Molecular-Level Characterization of Normal, Benign, and Malignant Breast Tissues Using FTIR Spectroscopy. Journal of Medical and Biological Engineering. 2016;36(3):369–78.
- View Article
- Google Scholar
54. Ferreira ICC, Aguiar EMG, Silva ATF, Santos LLD, Cardoso-Sousa L, Araújo TG, et al. Attenuated Total Reflection-Fourier Transform Infrared (ATR-FTIR) Spectroscopy Analysis of Saliva for Breast Cancer Diagnosis. J Oncol. 2020;2020:4343590. pmid:32104176
- View Article
- PubMed/NCBI
- Google Scholar
55. Su KY, Lee WL. Fourier transform infrared spectroscopy as a cancer screening and diagnostic tool: A review and prospects. Cancers. 2020;12(115):1–19. pmid:31906324
- View Article
- PubMed/NCBI
- Google Scholar
56. Jothi AA, Rajam MA. A survey on automated cancer diagnosis from histopathology images. Artificial Intelligence Review. 2017;48(1):31–81.
- View Article
- Google Scholar
57. Großerueschkamp F, Kallenbach-Thieltges A, Behrens T, Brüning T, Altmayer M, Stamatis G, et al. Marker-free automated histopathological annotation of lung tumour subtypes by FTIR imaging. Analyst. 2015;140(7):2114–20. pmid:25529256
- View Article
- PubMed/NCBI
- Google Scholar
58. Baker M, Hussain Shawn, Lovergne L, Untereiner V, Hughes C, Lukaszewski R, et al. Developing and understanding biofluid vibrationl spectroscopy: A critical review. Chemical Society Reviews. 2016;45(7):1803–18. pmid:26612430
- View Article
- PubMed/NCBI
- Google Scholar
59. Santos CR, Schulze A. Lipid metabolism in cancer. FEBS Journal. 2012;279(15):2610–23. pmid:22621751
- View Article
- PubMed/NCBI
- Google Scholar
60. Bénard A, Desmedt C, Durbecq V, Rouas G, Larsimont D, Sotiriou C, et al. Discrimination between healthy and tumor tissues on formalin-fixed paraffin-embedded breast cancer samples using IR imaging. Journal of Spectroscopy. 2010;24:67–72.
- View Article
- Google Scholar
61. Lazaro-Pacheco D, Shaaban A, Baldwin G, Titiloye N, Rehman S, Rehman I. Deciphering the structural and chemical composition of breast cancer using FTIR spectroscopy. Applied Spectroscopy Reviews. 2014;6:29–32.
- View Article
- Google Scholar
62. Li L, Wu J, Yang L, Wang H, Xu Y, Shen K. Fourier transform infrared spectroscopy: An innovative method for the diagnosis of ovarian cancer. Current Management and Research. 2021;13:2389–99.
- View Article
- Google Scholar
63. Zaporozhchenko IA, Ponomaryova AA, Rykova EY, Laktionov PP. The potential of circulating cell-free RNA as a cancer biomarker: challenges and opportunities. Expert Review of Molecular Diagnostics. 2018;18(2):133–45. pmid:29307231
- View Article
- PubMed/NCBI
- Google Scholar
64. Zois CE, Harris AL. Glycogen metabolism has a key role in the cancer microenvironment and provides new targets for cancer therapy. J Mol Med. 2016;94(2):137–54. pmid:26882899
- View Article
- PubMed/NCBI
- Google Scholar
65. Zhang P, Lehmann BD, Shyr Y, Guo Y. The Utilization of Formalin Fixed-Paraffin-Embedded Specimens in High Throughput Genomic Studies. Int J Genomics. 2017;Article ID. pmid:28246590
- View Article
- PubMed/NCBI
- Google Scholar
66. Auner GW, Koya K, Huang C, Broadbent B, Trexler M, Auner Z, et al. Applications of Raman spectroscopy in cancer diagnosis. Cancer and Metastasis Reviews. 2018;37(4):691–717. pmid:30569241
- View Article
- PubMed/NCBI
- Google Scholar
67. Hughes C, Baker MJ. Can mid-infrared biomedical spectroscopy of cells, fluids and tissue aid improvements in cancer survival? A patient paradigm. Analyst. 2016;141(2):467–75. pmid:26501136
- View Article
- PubMed/NCBI
- Google Scholar
68. Meany DL, Sokoll LJ, Chan DW. Early detection of cancer: Immunoassays for plasma tumor markers. Expert Opinion on Medical Diagnostics. 2009;3(6):597–605. pmid:19966928
- View Article
- PubMed/NCBI
- Google Scholar
69. Park J, Shin Y, Kim TH, Kim DH, Lee A. Plasma metabolites as possible biomarkers for diagnosis of breast cancer. PLoS ONE. 2019;14(12):1–12. pmid:31794572
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Mainiero MB, Lourenco A, Mahoney MC, Newell MS, Bailey L, Barke LD, et al. ACR Appropriateness Criteria Breast Cancer Screening. J Am Coll Radiol [Internet]. 2013;10:11–4. Available from: https://doi.org/10.1016/j.jacr.2016.09.021 pmid:23290667
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Brem RF, Lenihan MJ, Lieberman J, Torrente J. American women having dense breast tissue. AJR Am J Roentgenol. 2015;204(2):234–40. pmid:25615743
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Mathenge EG, Dean CA, Clements D, Vaghar-kashani A, Giacomantonio M, Malueth B, et al. Core Needle Biopsy of Breast Cancer Tumors Increases Distant Metastases in a Mouse Model. Neoplasia [Internet]. 2014;16(11):950–60. Available from: https://doi.org/10.1016/j.neo.2014.09.004 pmid:25425969
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Kabel AM. Tumor markers of breast cancer: New prospectives. J Oncol Sci [Internet]. 2017;3(1):5–11. Available from: https://doi.org/10.1016/j.jons.2017.01.001
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref5] 5. Bunaciu AA, Aboul-Enein HY, Fleschin S. FTIR Spectrophotometric Methods Used for Antioxidant Activity Assay in Medicinal Plants. Appl Spectrosc Rev [Internet]. 2012;47(4):245–55. Available from: http://www.tandfonline.com/doi/abs/10.1080/05704928.2011.645260
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref6] 6. Yang D, Castro DJ, el-Sayed IH, el-Sayed MA, Saxton RE, Zhang NY. A Fourier-transform infrared spectroscopic comparison of cultured human fibroblast and fibrosarcoma cells: a new method for detection of malignancies. J Clin Laser Med Surg [Internet]. 1995;13(2):55–9. Available from: http://www.ncbi.nlm.nih.gov/pubmed/10150573 pmid:10150573
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref7] 7. Siqueira LFS, Lima KMG. A decade (2004–2014) of FTIR prostate cancer spectroscopy studies: An overview of recent advancements. Trends Anal Chem [Internet]. 2016;82:208–21. Available from: https://doi.org/10.1016/j.trac.2016.05.028
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref8] 8. Bhalerao RY, Jani HP, Gaitonde RK, Raut V. A novel approach for detection of Lung Cancer using Digital Image Processing and Convolution Neural Networks. 2019 5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019. 2019;577–83.

[ref9] 9. Rossetto AM, Zhou W. Deep Learning for Categorization of Lung Cancer CT Images. Proceedings—2017 IEEE 2nd International Conference on Connected Health: Applications, Systems and Engineering Technologies, CHASE 2017. 2017;272–3.

[ref10] 10. Kido S, Hirano Y, Hashimoto N. Detection and classification of lung abnormalities by use of convolutional neural network (CNN) and regions with CNN features (R-CNN). 2018 International Workshop on Advanced Image Technology, IWAIT 2018. 2018;1–4.

[ref11] 11. Santillan A, Tomas RC, Bangaoil R, Lopez R, Gomez MH, Fellizar A, et al. Discrimination of malignant from benign thyroid lesions through neural networks using FTIR signals obtained from tissues. Anal Bioanal Chem. 2021;413(8):2163–80. pmid:33569645
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref12] 12. Kaur B, Mann KS, Grewal MK. Ovarian cancer stage based detection on convolutional neural network. Proceedings of the 2nd International Conference on Communication and Electronics Systems, ICCES 2017. 2018;2018-Janua(Icces):855–9.

[ref13] 13. Rahman MA, Muniyandi RC, Islam KT, Rahman MM. Ovarian Cancer Classification Accuracy Analysis Using 15-Neuron Artificial Neural Networks Model. 2019 IEEE Student Conference on Research and Development, SCOReD 2019. 2019;33–8.

[ref14] 14. Zou L, Yu S, Meng T, Zhang Z, Liang X, Xie Y. A Technical Review of Convolutional Neural Network-Based Mammographic Breast Cancer Diagnosis. Computational and Mathematical Methods in Medicine. 2019;2019(Dm). pmid:31019547
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref15] 15. Zuluaga-Gomez J, Masry Z Al, Benaggoune K, Meraghni S, Zerhouni N. A CNN-based methodology for breast cancer diagnosis using thermal images. 2019;0–2.

[ref16] 16. Ragab DA, Sharkas M, Marshall S, Ren J. Breast cancer detection using deep convolutional neural networks and support vector machines. PeerJ. 2019;2019(1):1–23. pmid:30713814
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref17] 17. Mishra A, Cheng H. Advanced CNN Architectures. 2017.

[ref18] 18. Vert JP. Artificial intelligence and cancer genomics. Healthcare and Artificial Intelligence. 2020;1(February):165–74.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref19] 19. Barth A. Infrared spectroscopy of proteins. Biochim Biophys Acta—Bioenerg. 2007;1767(9):1073–101. pmid:17692815
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref20] 20. Zeng X, Yeung DS. A quantified sensitivity measure for multilayer perceptron to input perturbation. Neural Computation. 2003;15(1):183–212. pmid:12590825
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref21] 21. Cao M, Qiao P. Neural network committee-based sensitivity analysis strategy for geotechnical engineering problems. Neural Computing and Applications. 2008;17(5–6):509–19.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref22] 22. Sysoev A, Ciurlia A, Sheglevatych R, Blyumin S. Sensitivity analysis of neural network models: Applying methods of analysis of finite fluctuations. Periodica polytechnica Electrical engineering and computer science. 2019;63(4):306–11.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref23] 23. Bangaoil R, Santillan A, Angeles LM, Abanilla L, Lim A, Ramos MC, et al. ATR-FTIR spectroscopy as adjunct method to the microscopic examination of hematoxylin and eosin-stained tissues in diagnosing lung cancer. PLoS One [Internet]. 2020;15(5):e0233626. Available from: https://doi.org/10.1371/journal.pone.0233626 pmid:32469931
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref24] 24. Podshyvalov A, Sahu RK, Mark S, Kantarovich K, Guterman H, Goldstein J, et al. Distinction of cervical cancer biopsies by use of infrared microspectroscopy and probabilistic neural networks. Appl Opt [Internet]. 2005;44(18):3725. Available from: https://www.osapublishing.org/abstract.cfm?URI=ao-44-18-3725 pmid:15989047
View Article
PubMed/NCBI
Google Scholar

[67] View Article

[68] PubMed/NCBI

[69] Google Scholar

[ref25] 25. Salman A, Shufan E, Sahu RK, Mordechai S, Sebbag G. Insights on colorectal cancer relapse by infrared microscopy from anastomosis tissues: Further analysis. Vib Spectrosc. 2016;83:17–25.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref26] 26. Zhang X, Xu Y, Zhang Y, Wang L, Hou C, Zhou X, et al. Intraoperative detection of thyroid carcinoma by fourier transform infrared spectrometry. J of Surg Res. 2011;171(2):650–6. pmid:20828740
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref27] 27. Wu M, Zhang W, Tian P, Ling X, Xu Z. Intraoperative diagnosis of thyroid diseases by fourier transform infrared spectroscopy based on support vector machine. Int J Clin Exp Med. 2016;9(2):2351–8.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref28] 28. Bhosale JS. High signal-to-noise Fourier transform spectroscopy with light emitting diode sources. Rev Sci Instrum. 2011;82(9). pmid:21974569
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref29] 29. Klambauer G, Unterthiner T, Mayr A, Hochreiter S. Self-normalizing neural networks. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). 2017.

[ref30] 30. Costa F, Marques A, Arnaud-Fassetta G, Alonso J, Martins I, Guerra C. Self-Normalizing Neural Networks Günter. European Continental Hydrosystems under Changing Water Policy. 2013;(Nips):99–112.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref31] 31. Bogomolny E, Argov S, Mordechai S, Huleihel M. Monitoring of viral cancer progression using FTIR microscopy: A comparative study of intact cells and tissues. Biochim Biophys Acta. 2008;1780(9):1038–46. pmid:18588944
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref32] 32. Ghimire H, Venkataramani M, Bian Z, Liu Y, Perera AGU. ATR-FTIR spectral discrimination between normal and tumorous mouse models of lymphoma and melanoma from serum samples. Sci Rep. 2017;7(1). pmid:29209060
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref33] 33. Liu H, Su Q, Sheng D, Zheng W, Wang X. Comparison of red blood cells from gastric cancer patients and healthy persons using FTIR spectroscopy. J Mol Struct. 2017;1130:33–7.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref34] 34. Wang X, Shen X, Sheng D, Chen X, Liu X. FTIR spectroscopic comparison of serum from lung cancer patients and healthy persons. Spectrochim Acta—Part A Mol Biomol Spectrosc [Internet]. 2014;122:193–7. Available from: https://doi.org/10.1016/j.saa.2013.11.049 pmid:24316532
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref35] 35. Lewis PD, Lewis KE, Ghosal R, Bayliss S, Lloyd AJ, Wills J, et al. Evaluation of FTIR Spectroscopy as a diagnostic tool for lung cancer using sputum. BMC Cancer [Internet]. 2010;10(1):640. Available from: http://bmccancer.biomedcentral.com/articles/10.1186/1471-2407-10-640 pmid:21092279
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref36] 36. Philipp M, Rusch T, Hornik K, Strobl C. Measuring the Stability of Results From Supervised Statistical Learning. Journal of Computational and Graphical Statistics. 2018;27(4):685–700.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref37] 37. Costa F, Marques A, Arnaud-Fassetta G, Alonso J, Martins I, Guerra C. Self-Normalizing Neural Networks Günter. In: 31st Conf Neural Inf Process Syst (NIPS). 2017. p. 99–112.

[ref38] 38. Gu Q, Li Z, Han J. Linear discriminant dimensionality reduction. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2011;6911 LNAI(PART 1):549–64.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref39] 39. Cristianini N, Schölkopf B. Support vector machines and kernel methods: The new generation of learning machines. AI Magazine. 2002;23(3):31–41.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref40] 40. Murty MN, Raghava R. Linear support vector machines. SpringerBriefs in Computer Science. 2016;(9783319410623):41–56.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref41] 41. Platt JC. Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. 2000;(June):1–11.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref42] 42. Gehrke J. Classification and Regression Trees. Encyclopedia of Data Warehousing and Mining. 2011;246–80.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref43] 43. Reza M, Miri S, Javidan R. A Hybrid Data Mining Approach for Intrusion Detection on Imbalanced NSL-KDD Dataset. International Journal of Advanced Computer Science and Applications. 2016;7(6):1–33.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref44] 44. Guha R, Stanton DT, Jurs PC. Interpreting computational neural network quantitative structure-activity relationship models: A detailed interpretation of the weights and biases. Journal of Chemical Information and Modeling. 2005;45(4):1109–21. pmid:16045306
View Article
PubMed/NCBI
Google Scholar

[130] View Article

[131] PubMed/NCBI

[132] Google Scholar

[ref45] 45. Montavon G, Samek W, Müller KR. Methods for interpreting and understanding deep neural networks. Digital Signal Processing: A Review Journal. 2018;73:1–15.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref46] 46. Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, Goyal H. Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Annals of Translational Medicine. 2018;6(11):216–216. pmid:30023379
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref47] 47. Duchi J, Hazan E, Singer Y. Randomized smoothing for (parallel) stochastic optimization. J Mach Learn Res. 2011;12:2121–59.
View Article
Google Scholar

[141] View Article

[142] Google Scholar

[ref48] 48. Sitnikova VE, Kotkova MA, Nosenko TN, Kotkova TN, Martynova DM, Uspenskaya M V. Breast cancer detection by ATR-FTIR spectroscopy of blood serum and multivariate data-analysis. Talanta. 2020;214(October 2019):120857. pmid:32278436
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref49] 49. Kar S, Katti DR, Katti KS. Fourier transform infrared spectroscopy based spectral biomarkers of metastasized breast cancer progression. Spectrochim Acta Part A Mol Biomol Spectrosc. 2019;208:85–96. pmid:30292907
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref50] 50. Blat A, Wiercigroch E, Smeda M, Wislocka A, Chlopicki S, Malek K. Fourier transform infrared spectroscopic signature of blood plasma in the progression of breast cancer with simultaneous metastasis to lungs. Journal of Biophotonics. 2019;12(10):1–11. pmid:31265171
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

[ref51] 51. Depciuch J, Stanek-Widera A, Skrzypiec D, Lange D, Biskup-Frużyńska M, Kiper K, et al. Spectroscopic identification of benign (follicular adenoma) and cancerous lesions (follicular thyroid carcinoma) in thyroid tissues. J Pharm Biomed Anal. 2019;170:321–6. pmid:30954022
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref52] 52. Zelig U, Barlev E, Bar O, Gross I, Flomen F, Mordechai S, et al. Early detection of breast cancer using total biochemical analysis of peripheral blood components: a preliminary study. BMC Cancer [Internet]. 2015;15(1):408. Available from: http://bmccancer.biomedcentral.com/articles/10.1186/s12885-015-1414-7 pmid:25975566
View Article
PubMed/NCBI
Google Scholar

[160] View Article

[161] PubMed/NCBI

[162] Google Scholar

[ref53] 53. Elshemey WM, Ismail AM, Elbialy NS. Molecular-Level Characterization of Normal, Benign, and Malignant Breast Tissues Using FTIR Spectroscopy. Journal of Medical and Biological Engineering. 2016;36(3):369–78.
View Article
Google Scholar

[164] View Article

[165] Google Scholar

[ref54] 54. Ferreira ICC, Aguiar EMG, Silva ATF, Santos LLD, Cardoso-Sousa L, Araújo TG, et al. Attenuated Total Reflection-Fourier Transform Infrared (ATR-FTIR) Spectroscopy Analysis of Saliva for Breast Cancer Diagnosis. J Oncol. 2020;2020:4343590. pmid:32104176
View Article
PubMed/NCBI
Google Scholar

[167] View Article

[168] PubMed/NCBI

[169] Google Scholar

[ref55] 55. Su KY, Lee WL. Fourier transform infrared spectroscopy as a cancer screening and diagnostic tool: A review and prospects. Cancers. 2020;12(115):1–19. pmid:31906324
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref56] 56. Jothi AA, Rajam MA. A survey on automated cancer diagnosis from histopathology images. Artificial Intelligence Review. 2017;48(1):31–81.
View Article
Google Scholar

[175] View Article

[176] Google Scholar

[ref57] 57. Großerueschkamp F, Kallenbach-Thieltges A, Behrens T, Brüning T, Altmayer M, Stamatis G, et al. Marker-free automated histopathological annotation of lung tumour subtypes by FTIR imaging. Analyst. 2015;140(7):2114–20. pmid:25529256
View Article
PubMed/NCBI
Google Scholar

[178] View Article

[179] PubMed/NCBI

[180] Google Scholar

[ref58] 58. Baker M, Hussain Shawn, Lovergne L, Untereiner V, Hughes C, Lukaszewski R, et al. Developing and understanding biofluid vibrationl spectroscopy: A critical review. Chemical Society Reviews. 2016;45(7):1803–18. pmid:26612430
View Article
PubMed/NCBI
Google Scholar

[182] View Article

[183] PubMed/NCBI

[184] Google Scholar

[ref59] 59. Santos CR, Schulze A. Lipid metabolism in cancer. FEBS Journal. 2012;279(15):2610–23. pmid:22621751
View Article
PubMed/NCBI
Google Scholar

[186] View Article

[187] PubMed/NCBI

[188] Google Scholar

[ref60] 60. Bénard A, Desmedt C, Durbecq V, Rouas G, Larsimont D, Sotiriou C, et al. Discrimination between healthy and tumor tissues on formalin-fixed paraffin-embedded breast cancer samples using IR imaging. Journal of Spectroscopy. 2010;24:67–72.
View Article
Google Scholar

[190] View Article

[191] Google Scholar

[ref61] 61. Lazaro-Pacheco D, Shaaban A, Baldwin G, Titiloye N, Rehman S, Rehman I. Deciphering the structural and chemical composition of breast cancer using FTIR spectroscopy. Applied Spectroscopy Reviews. 2014;6:29–32.
View Article
Google Scholar

[193] View Article

[194] Google Scholar

[ref62] 62. Li L, Wu J, Yang L, Wang H, Xu Y, Shen K. Fourier transform infrared spectroscopy: An innovative method for the diagnosis of ovarian cancer. Current Management and Research. 2021;13:2389–99.
View Article
Google Scholar

[196] View Article

[197] Google Scholar

[ref63] 63. Zaporozhchenko IA, Ponomaryova AA, Rykova EY, Laktionov PP. The potential of circulating cell-free RNA as a cancer biomarker: challenges and opportunities. Expert Review of Molecular Diagnostics. 2018;18(2):133–45. pmid:29307231
View Article
PubMed/NCBI
Google Scholar

[199] View Article

[200] PubMed/NCBI

[201] Google Scholar

[ref64] 64. Zois CE, Harris AL. Glycogen metabolism has a key role in the cancer microenvironment and provides new targets for cancer therapy. J Mol Med. 2016;94(2):137–54. pmid:26882899
View Article
PubMed/NCBI
Google Scholar

[203] View Article

[204] PubMed/NCBI

[205] Google Scholar

[ref65] 65. Zhang P, Lehmann BD, Shyr Y, Guo Y. The Utilization of Formalin Fixed-Paraffin-Embedded Specimens in High Throughput Genomic Studies. Int J Genomics. 2017;Article ID. pmid:28246590
View Article
PubMed/NCBI
Google Scholar

[207] View Article

[208] PubMed/NCBI

[209] Google Scholar

[ref66] 66. Auner GW, Koya K, Huang C, Broadbent B, Trexler M, Auner Z, et al. Applications of Raman spectroscopy in cancer diagnosis. Cancer and Metastasis Reviews. 2018;37(4):691–717. pmid:30569241
View Article
PubMed/NCBI
Google Scholar

[211] View Article

[212] PubMed/NCBI

[213] Google Scholar

[ref67] 67. Hughes C, Baker MJ. Can mid-infrared biomedical spectroscopy of cells, fluids and tissue aid improvements in cancer survival? A patient paradigm. Analyst. 2016;141(2):467–75. pmid:26501136
View Article
PubMed/NCBI
Google Scholar

[215] View Article

[216] PubMed/NCBI

[217] Google Scholar

[ref68] 68. Meany DL, Sokoll LJ, Chan DW. Early detection of cancer: Immunoassays for plasma tumor markers. Expert Opinion on Medical Diagnostics. 2009;3(6):597–605. pmid:19966928
View Article
PubMed/NCBI
Google Scholar

[219] View Article

[220] PubMed/NCBI

[221] Google Scholar

[ref69] 69. Park J, Shin Y, Kim TH, Kim DH, Lee A. Plasma metabolites as possible biomarkers for diagnosis of breast cancer. PLoS ONE. 2019;14(12):1–12. pmid:31794572
View Article
PubMed/NCBI
Google Scholar

[223] View Article

[224] PubMed/NCBI

[225] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Ethical clearance

Study population and sample preparation

ATR-FTIR spectral analysis

Characterization and pre-processing of spectral data

Principal component analysis (PCA)

Classifiers

Cross-validation of models.

Feedforward neural networks (FNN).

Linear discriminant analysis (LDA).

Support vector machine (SVM).

Logistic regression (LR).

Decision tree (DT) and random forest (RF).

Naïve bayes (NB).

Identification of dominant spectral components

Visual peak analysis.

Sensitivity analysis of neural network.

Motivation and theory.

Results

Samples

Feedforward neural network designs

Neural network optimization

Diagnostic performance of models

Visual peak analysis of data

Significant peaks identified by neural networks

Discussion

Acknowledgments

References