Detection of breast cancer by ATR-FTIR spectroscopy using artificial neural networks

In this study, three (3) neural networks (NN) were designed to discriminate between malignant (n = 78) and benign (n = 88) breast tumors using their respective attenuated total reflection Fourier transform infrared (ATR-FTIR) spectral data. A proposed NN-based sensitivity analysis was performed to determine the most significant IR regions that distinguished benign from malignant samples. The result of the NN-based sensitivity analysis was compared to the obtained results from FTIR visual peak identification. In training each NN models, a 10-fold cross validation was performed and the performance metrics–area under the curve (AUC), accuracy, positive predictive value (PPV), specificity rate (SR), negative predictive value (NPV), and recall rate (RR)–were averaged for comparison. The NN models were compared to six (6) machine learning models–logistic regression (LR), Naïve Bayes (NB), decision trees (DT), random forest (RF), support vector machine (SVM) and linear discriminant analysis (LDA)–for benchmarking. The NN models were able to outperform the LR, NB, DT, RF, and LDA for all metrics; while only surpassing the SVM in accuracy, NPV and SR. The best performance metric among the NN models was 90.48% ± 10.30% for AUC, 96.06% ± 7.07% for ACC, 92.18 ± 11.88% for PPV, 94.19 ± 10.57% for NPV, 89.04% ± 16.75% for SR, and 94.34% ± 10.54% for RR. Results from the proposed sensitivity analysis were consistent with the visual peak identification. However, unlike the FTIR visual peak identification method, the NN-based method identified the IR region associated with C–OH C–OH group carbohydrates as significant. IR regions associated with amino acids and amide proteins were also determined as possible sources of variability. In conclusion, results show that ATR-FTIR via NN is a potential diagnostic tool. This study also suggests a possible more specific method in determining relevant regions within a sample’s spectrum using NN.


Introduction
Breast cancer remains the most prevalent cancer among women. Biennial mammography has been highly recommended for women 50 to 74 years old for early detection of this disease. Sensitivity of mammography has been increased to 92.7% when combined with magnetic resonance imaging (MRI). Meanwhile, combination with ultrasound (US) can increase sensitivity to only 52%. Therefore, in high-risk women for whom supplemental screening is specified, MRI is recommended when possible [1]. Supplemental screening with US for women with intermediate risk and dense breasts is an option to increase cancer detection. The mammographic sensitivity for breast cancer in women with very dense breasts is 47.6% and increased to 76.1% with US screening [2]. Suspicious lesions detected during mammograms are usually biopsied to confirm or rule out breast cancer. The most common form of biopsy is the core needle biopsy (CNB), which involves the removal of a portion of a tumor for histologic evaluation. The remainder is removed later after a definitive diagnosis of cancer. However, since tissue samples are collected by three to nine passes with a biopsy needle, the incisional surgical procedures become associated with elevated incidence of lymph node metastasis and higher local recurrence rates [3]. CEA can also be analysed to screen for breast cancer. However, it lacks disease sensitivity and specificity, hence cannot be used for screening a subpopulation with high risk for malignancies, a general asymptomatic population, or for independently diagnosing cancer. CA 15-3, which are soluble forms of the transmembrane protein Mucin1 (MUC1), is said to be overexpressed in malignant breast tumors. It was suggested that CA 15-3 and CEA can be considered complementary in detecting recurrence of breast cancer. However, their sensitivity is low and independent of the majority of the prognostic parameters that may be considered before relapse [4].
The potential of using infrared spectroscopy, in particular Fourier transform infrared (FTIR) spectroscopy, has been gaining popularity for cancer diagnostics over the last few years. The distinctive spectral properties associated with the changes in chemical composition and structure of biomolecules can be recognized by FTIR spectroscopy, making it a potential diagnostic tool [5]. Hence, when cells or tissues undergo transformation from normal to cancerous, changes in the physico-chemical structures and properties in a variety of their biomolecules can be simultaneously and indiscriminately probed by FTIR spectroscopy [6,7]. FTIR, as compared to traditional microscopic examination of hematoxylin and eosin (H&E) stained biopsies, is more rapid, cost-effective, and objective since reading is based on changes in biochemical properties instead of morphology [7]. It eliminates the possibility of intra-and inter-observer variability, which is often the problem with H&E staining. Moreover, this technology does not make use of dyes and other contrasting agents which may interfere or affect the reading. Hence, it provides more accurate and reproducible results.
The application of artificial intelligence (AI) in cancer diagnosis is no longer new; numerous studies have already been applied to address the most prevalent cancers such as lung cancer [8][9][10], thyroid cancer [11], ovarian cancer [12,13] and breast cancer [14][15][16][17]. Most studies make use of image-based data such as MRI images, computed tomography (CT) scans, positron emission tomography (PET) scans, X-rays, and H&E-stained biopsy images, which is the gold standard [18]. The most successful AI implemented in these studies involves artificial neural networks (NN), in particular, convolutional neural networks (CNN) due to their proven effectiveness in processing images. Furthermore, the underlying architecture of a CNN makes it easily feasible to create visual representation, highlighting the site of malignancy within a medical image. What limits image-based AI diagnostics, however, is that they heavily rely on the detection of a visible abnormality within the scanned region. This implies that a patient may already be at an advanced stage of malignancy before possible detection.
Moreover, the presence of dyes and other contrasting agents may make it difficult to apply in other laboratories an AI model trained using the procedures in one laboratory, if protocols and procedures are not well standardized.
The appeal of using FTIR data in AI is that they come in less file size and are easier to process than images, while still providing sufficiently adequate information for samples [19]. Hence, data storage costs may be minimized significantly and models utilizing such data become easier and faster to train, hence minimizing training costs. Moreover, the use of FTIR data may be able to predict the onset of cancer even before evident morphological changes [5], hence addressing the limitation of image-based AI diagnostics. However, since NNs are essentially black boxes, the underlying process involving its decision-making is inherently unknown; thus making them less appealing to use in a clinical setting. Furthermore, FTIR data are less intuitive to interpret than images, even with the assistance of AI visualizations; making them difficult to interpret. In this study, a method was formulated to address this limitation by providing a novel process of determining the most prevalent biomarkers as seen by trained artificial NNs; hence providing a basis on decision-making process. The proposed method is a modification of NN perturbation-based sensitivity analysis [20][21][22], which probes a NN's sensitivity towards changes in an input variable.
Hence, this study showed the potential of artificial neural networks (ANN) in accurately diagnosing breast cancer through infrared spectroscopy. Specifically, it designed multiple ANN to diagnose malignancy from breast tissues using ATR-FTIR data. The classification performance of the NN models were compared to six (6) most widely-used machine learning models. Moreover, this study also proposed a novel method for determining the IR regions which may be significant in determining breast cancer malignancy, as seen by a NN design. This proposed method may serve as a baseline process in analysing spectral data, and more importantly provide new insights and directions for pathologists and medical practitioners. . Written informed consent from the participants or their legal guardians have been waived by the respective ethics review boards since the study was restricted to the use of archived formalin fixed paraffin embedded (FFPE) breast tissues and did not involve additional procedures nor pose risk of harm to subjects. All methods were carried out in accordance with the Declaration of Helsinki and its later amendments.

Study population and sample preparation
Two hundred (200) FFPE breast biopsies obtained from 192 adult patients seen at USTH and MMMH between January 2016 to December 2016 were included in the study. The samples were diagnosed by the resident pathologists of the hospital study sites as either benign (n = 91) or malignant (n = 101) based on microscopic examination of H&E-stained biopsies. The malignant samples were further subclassified as invasive ductal carcinoma, residual ductal carcinoma-in-situ, or invasive lobular carcinoma; and the benign samples as fibroadenoma, fibrocystic disease, benign fibroadipose tissue, fibrocollagenous cyst wall, or intraductal papillomas.
The FFPE tissues were uniformly cut at 5-μm thickness using a microtome (Leica Biosystems, Germany) and three (3) adjacent tissue sections were mounted on glass slides. The two (2) outer sections were stained with H&E for re-evaluation by a third-party pathologist who was blinded of the original diagnosis. The pathologist was instructed to classify the biopsy sample as either benign or malignant and to mark the location of the cancer cells if the sample was malignant, to serve as guide in the ATR-FTIR analysis [23]. The inner or middle tissue section was deparaffinized with xylol, dehydrated with alcohol, rinsed in distilled water, dried overnight, and subjected to ATR-FTIR analysis [24,25].
Only the samples with similar diagnosis by the resident pathologists of the hospital study sites and the third-party pathologist were considered for further analysis. In this case, out of the 200 archival samples, only 166 (n = 78 benign; n = 88 malignant) were taken for ATR-FTIR processing. Furthermore, each of the 166 samples corresponded to one patient each to satisfy a one-is-to-one correspondence between patients and specimens [11].

ATR-FTIR spectral analysis
An FTIR mid-infrared spectrometer equipped with a platinum ATR single reflection diamond sampling module (Bruker Optics, Germany) was used to obtain spectra of the breast samples. A performance qualification (PQ) test using OPUS 8.0 software's fully automated validation program was initially performed to ensure quality and accuracy of spectral data. The deparaffinized breast tissue sections were positioned directly in contact with the ATR diamond crystal's surface (2 mm x 2 mm) and the mid-IR region of 4000 cm -1 to 600 cm -1 was passed to and from the ATR accessory. Spectra were generated at a spatial resolution of 4 cm -1 and an average of 48 scans was co-added to obtain an adequate signal-to-noise ratio [26][27][28], which was further supported by the software's validation program as "acceptable". Prior to scanning each tissue sample, the background spectrum was recorded, and this spectrum was systematically subtracted by the software to routinely eliminate atmospheric effects. The malignant samples were scanned along the area containing the cancer cells, while the benign samples were scanned at random spots throughout their entire tissue section. The spectral data associated with a benign or a malignant tissue sample was obtained by computing for the median spectrum of their respective 48 scans.

Characterization and pre-processing of spectral data
The spectral data set, X SD , consisted of N | N = 166 spectral vectors x !ðiÞ SD 2 R L 8 i 2 N � N, comprising of 78 malignant FTIR data, X ðMÞ SD � X SD , and 88 benign FTIR data, X ðBÞ SD � X SD , where L | L = 462 denotes the length of each vector. An element of a vector x ! SD , x ! SD ðjÞ8 j 2 N � L corresponds to an absorbance reading within the fingerprint region of 1800 cm -1 to 850 cm -1 at 2 cm -1 steps. Furthermore, X SD can also be characterized as a matrix of R N×L dimension.
All obtained spectral data were internally normalized using z-score normalization, which is the recommended method of normalization for the FNN designs [29,30]. The normalization was done to eliminate bias from y-value discrepancies among the IR samples. Here, normalization was done per x !ðiÞ SD using the equation where the mean() and the std() notations denote the mean and the standard deviation of the elements of the vector x !ðiÞ SD . The implemented normalization scales the elements of x !ðiÞ SD to have an overall mean of 0, and a standard deviation of 1. X SD also underwent baseline correction via OPUS 8.0 software via "rubber band method" with 64 baseline points. This was done to approximate a polynomial fit based on the minima of y-values of each element vector x ! SD 2 X SD . The fitted polynomial was then deducted for all x ! SD to extract the baseline corrected spectrum [25,[31][32][33][34]. Finally, the corrected spectrum was scaled within the fingerprint region, from 1800 cm -1 to 850 cm -1 [32,35]. Other than baseline correction using rubber band method, no further user intervention was done to assess the spectral data. To visualize the average spectrum of benign and malignant breast tissues, their respective median values for each wavenumber was plotted.

Principal component analysis (PCA)
Principal component analysis (PCA) was performed to visualize the distribution of benign and malignant samples over two of the PCA's most dominant components (F 1 and F 2 ). The process of translating X SD to the reduced variable space, X PCA (X SD ! X PCA ) is given by the equation where X PCA 2 R N × 2 is the reduced sample space, X SD is the mean absorbance value of

Classifiers
Three (3) feed forward neural networks (FNN) of different layer sizes were designed in the study. To benchmark the NN models, six of the most widely used machine learning models were also created, in particular, linear discriminant analysis (LDA), support vector machine (SVM), logistic regression (LR), decision tree (DT), random forest (RF), and Naïve Bayes (NB). The following subsections discuss in detail the design of each model. Cross-validation of models. A 10-fold cross-validation procedure was used to evaluate all models; where 70% of the spectral data set X SD , were used for the training set S TR � X SD , and the remaining 30% were equally partitioned for the validation set S V � X SD (15%) and the test set S TS � X SD (15%). The cross-validation procedure was repeated over 50 trials (T) to ensure stability of results [36]. For each trial, the elements of the sets S TR , S V , and S TS were reselected from X SD randomly. The sets satisfied the criteria S ðiÞ Moreover, the ratio of malignant and benign samples was preserved for each set. To evaluate the performance of each model, the metrics area under the curve (AUC), accuracy (ACC), positive predictive value (PPV), negative predictive value (NPV), recall rate (RR) and specificity rate (SR) were obtained. The overall mean and standard deviation of the metrics over the 50 trials were obtained using the formulas ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X N 1 ðm n;t À m n;t Þ 2 N À 1 s Þ ð3:2Þ in which M m and ∑ m are the overall mean and the overall standard deviation of a metric m, where m n,t is the metric value of a metric m, for a trial t 2 N �T and a fold n 2 N � 10. Moreover, the variable m n;t in Eq 3.2 is the N-fold mean of a metric m, which is also equal to 1 N X N 1 m n;t from Eq 3.1.
All models were designed and implemented using MATLAB R2020b on an Intel i7-6700 3.40 GHz CPU and an Nvidia GeForce GTX 1050 Ti GPU over a 16 GB RAM.

Feedforward neural networks (FNN). Three (3) feed forward neural networks (FNN)
were designed in the study. Each neural network has an input layer N ! i , consisting of L nodes corresponding to the length of each spectral vector input x ! SD , and an output layer N ! o , consisting of 2 nodes which correspond to the sample's diagnosis of being benign or malignant. A neural network varies in hidden layer size (n = 2, n = 4, n = 8) with respect to the other. For all FNNs, scaled exponential linear units (SELU) were used as activation functions within the hidden layers [37], and softmax function for the output layer.
Linear discriminant analysis (LDA). Here, S TR was used to design an LDA model f LDA (x) which returns the probability of a spectral vector input x :¼ x ! SD from being benign or malignant. S TS was used to measure the metric performance of the model. Before constructing a linear separator among the samples, principal component analysis (PCA) was performed to reduce the dimension of data from 462 to two variables, which are F 1 and F 2 . The reduction of dimensional space via PCA (X SD ! X PCA ) follows the discussion from Eq 2.
The LDA model was constructed following Fisher's criterion [38] where the probability density function describing the likelihood of a sample from being malignant or benign is given by the function f LDA (x) [11] f LDA ðxÞ ¼ softmaxð½p M ðxÞ; p B ðxÞ�Þ ð4:1Þ in which p M (x) and p B (x) are normal probability density functions describing the probability distribution of malignant and the benign samples, respectively. The normal curves are defined as where Γ is the projection of X PCA onto w min , while μ and σ are the class mean and standard deviation, respectively.
In evaluating an element of the test set x !ðTSÞ from the training set via Eq 2 was first used to translate x !ðTSÞ SD from a dimension of R L to R 2 before evaluation using Eq 4.2.
Support vector machine (SVM). The designed SVM is a linear SVM of input x :¼ x ! SD . The SVM was designed from the elements of the training set by considering an unconstrained Langrange optimization problem given by the equations [39,40]  was implemented by considering the gradients To optimize the model, a grid search was performed from a series of learning rates ℓ from 1 to 5×10 −5 over 1000 epochs. The validation set accuracy was considered as the optimization metric which determined the superiority of one model from the other. The process was repeated for 50 trials to ensure stability. The average validation accuracy of a model over the 50 trials determined the overall metric of the model for a considered learning rate, ℓ i . The ℓ i which constituted the highest overall validation accuracy was considered as optimal learning rate to train and test the SVM model. The output probability diagnosis of the model for benign and malignant cases, p SVM (x), was computed using Platt's method [41]. Logistic regression (LR). The designed LR model is a L-input classifier with an output probability p LR (x) quantifying the likelihood of an input where w 2 R L and b 2 R 1 are the weights and bias characterizing the LR model. In training the model, SGD was used to minimize loss over the training set. The considered loss function was the binary cross-entropy loss function given by where p(y i ) denotes the probability of obtaining a malignant diagnosis given a theoretical output of y i ; where y i = 1 for malignant cases, and y i = 0 for benign. To optimize the model, a grid search was performed with a design similar to that performed for the SVM. Decision tree (DT) and random forest (RF). The classification and regression trees (CART) algorithm was used to generate DTs of binary splits. The Gini's diversity index was used to find the best input F j | j 2 N � L for splitting the training set for each iteration of branching; where F j is the j th wavenumber from the fingerprint region 1800 cm -1 to 850 cm -1 . Gini's diversity index [42] is given by where the maximum bound for the summation operation denotes the binary characteristic of the splitting considered. Furthermore, N mj b F is the total number of class m separated by the N t is the total number of elements in F ! j at node t and P(m|t) is the probability of class m for being either malignant or benign from happening at node t. Since the elements in F ! j are continuous variables, the best value of separation b F was identified from F ! j by considering the element having the least GINI(j) t metric [43]. The branching was recursively performed for each newly created node until the performance in the validation set accuracy decreased.
The designed RF utilized the creation of trees following the previously discussed. The diagnosis of the RF was determined as the prevailing diagnosis made by its constituent bags of DTs. To determine the optimum number of trees N RF for the RF, a grid search from 3 to 100 trees was performed. The validation set accuracy was considered as the optimization criteria of the search. Each simulation was repeated over 50 times for each iteration n|n 2 {3 � N � 100} to ensure stability. The average accuracy over the 50 trials served as the final performance metric of the RF for an N RF equal to n. The final RF constituted the design with the highest average accuracy.
Naïve bayes (NB). The designed NB is a classifier of two classes of n|n 2 N � L inputs.
For each j th input, F ! j , the best b F value of separating the elements of F ! j between two sub-classes was determined. The algorithm for finding b F is the same as that of the DT and RF designs where the Gini's index was used (Eq 7.1). The predictive value f NB (x) of the NB is defined as the probability of a sample for being malignant, given an input where the numerator corresponds to the total probability of an input x from happening, given n-inputs considered, with an x(j) value classified as class m j for a determined separation b F j for malignant cases. On the other hand, the denominator is the total probability of the set of m j from x for ever happening. The NB classifier outputs a malignant diagnosis when p NB > 50%; otherwise, the diagnosis is benign. In order to determine the optimal n-value for the classifier, the number of inputs was increased from 3 to L, where the inputs of the least GINI(j) t value were considered first. The optimization was terminated at the n-value where the validation accuracy of the model started to decrease. Each iteration of n was repeated for 50 trials, where the average validation accuracy from the 50 trials was the considered optimization metric criterion. The final NB constituted to the design with the highest average validation accuracy.

Identification of dominant spectral components
In order to identify the most significant wavenumbers which influenced a sample's diagnosis via the NN models, a novel sensitivity analysis was performed based on the optimized FNNs (n = 2, n = 4, n = 8). It must be noted that a visual peak analysis of the obtained spectral data was also performed prior to sensitivity analysis to compare the identified significant wavenumbers from the NN.
Visual peak analysis. Significant peaks in the fingerprint region were identified through visual inspection of X SD . Test of normal distribution using Shapiro-Wilk test and variance of homogeneity were performed for the identified peaks. Since all data followed a non-normal distribution, they were subjected to Mann-Whitney U test to assess if the absorbance peaks of malignant samples were significantly different (p-value <0.05) from the benign samples. Statistical analyses were performed using MATLAB 2020b.
Sensitivity analysis of neural network. A modified neural network committee-based (NNC) sensitivity analysis was considered using the input perturbation algorithm. In order to simplify the NNC sensitivity analysis, the committee of NNs were designed and trained following the design architecture of the optimized FNNs [20][21][22].
For the analysis, an experimental set S EXP � X SD was used to train and analyse the FNN design. S EXP comprises of 70% randomly-selected elements from X SD . The elements of S EXP was randomly selected from X SD . Moreover, the quantity of malignant and benign samples from S EXP were equally proportioned. For each selected input x ! SD ðjÞ8 j 2 N � L, a perturbation Δx j from -50% to 50% of x ! SD ðjÞ at 5% steps was added to the x ! SD ðjÞ, and the mean square error (MSE) of the output was tabulated; where x ! SD ðjÞ is the mean value of S EXP for the j th input [21]. The MSE for the j th input variable at its k th step perturbation Δx j,k (MSE j,k ) is calculated using the formula ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi O j;k ðiÞ 2 À b OðiÞ where N EXP is the total number of elements in S EXP , O j,k is the output vector of the model for 0i for a benign sample, and b O ¼ h0; 1i for a malignant one. The overall response of the network MSE j for the j th input was computed by averaging MSE j,k for all Δx j,k k-steps given by To ensure stability of the performed sensitivity analysis, the process was repeated for a committee of 50 NN (i.e., 50 trials). The overall response of the j th input, was computed as the average of the j th input response over the 50 trials MSE j .
To visualize the perturbation response of the considered FNN, MSE j was plotted for all j 2 N � L. The sensitivity analysis was performed for each FNN (n = 2, n = 4, n = 8).
Motivation and theory. The diagnostic ability of often-used machine learning models such as SVM, LDA, and PCA greatly relies on the variation among and between data within a data set X. This variation is often quantified using a covariance matrix, S 2 R N 2 . By obtaining the eigenvectors associated with S, the characteristic form of the data may be easily represented. However, this approach becomes ineffective if the variation among the data set elements X j 2 X becomes significantly small (but not infinitesimally small to imply repetitive data). In such a scenario: S ! 0, the obtained eigenvector solutions, e ! j j j 2 N � N, become trivial where je ! j j ! 0 8 j 2 N � N . This makes it difficult to distinguish important variables which may prove significant in determining an accurate diagnosis. For artificial NNs, the weights determine the correlation of the input parameters toward the outputs under 0-bias condition. Here, the magnitude of a weight determines the magnitude of the correlation, while the sign of the weight determines the direction of influence [44]. This simplistic model does not however explain the contribution of biases and activation functions in a network's decision-making process.
The proposed model of this study probes the significance of an input parameter x ! SD ðjÞ based on the magnitude of change at the output for given ranges of input perturbations Δx j,k . The magnitude of influence of an input variable is given by the MSE j,k , which is a magnitude value at the range of 0 to 1 since the output are probabilities. In obtaining MSE j,k , X ðMÞ SD and X ðBÞ SD were assumed as a single set since analyzing them separately would not provide an overall determination of the significant variables considering both classification. Furthermore, since the spectral data set X SD was normalized for each set element x !ðiÞ SD , MSE j,k was expected to be less varied between samples for all x !ðiÞ SD 2 X SD across all input variables. This assumption makes it justifiable to denote the overall average, MSE j , as the overall measure of an input response for a single neural network. The overall MSE magnitude, MSE j , was assumed as the final metric to measure the influence of a given variable, which is the average MSE j measure considering multiple similar NNs. In such process, it was assumed that each NN had similar input responses since each followed the same architecture, training, and optimization. Lastly, in determining the most significant input variables, MSE j was no longer ranked in contrast to usual sensitivity analyses [20-22, 45, 46]. Since the functional groups and vibrational modes associated with the input variables are usually presented in ranges within the IR spectrum, it was more appropriate to identify and discuss significant peaks from the plot of MSE j rather than in a ranked form.
Overall, this study proposed that input variables associated with comparatively high MSE constitute to significant wavenumbers important in the NN's diagnosis. Since the NN models were assumed as the prevailing models and are highly accurate, the determined wavenumbers may thus provide insights in the associated changes in chemical composition and structure of biomolecules in cancerous breast tissues. The overall method implemented in the study is summarized in Fig 1.

Samples
The clinical characteristics of the samples were retrieved from medical records and histopathology reports of the hospital study sites (Table 1). Among the malignant samples, majority were invasive ductal carcinoma. Meanwhile, the benign breast samples were mainly fibroadenoma and fibrocystic change ( Table 1). The above classifications were based on microscopic examination of H&E-stained specimens and immunohistochemical staining (if needed or available) following the current WHO classification.
The variation between benign and malignant samples is shown in Fig 2. From the PCA plot, 90.28% of the variability was associated with the first principal component F 1 , while only 5.12% was associated with the second principal component F 2 . Most of the benign samples were scattered along the negative domain of the F 1 axis while malignant samples were evenly scattered. Both sample classes followed a parabolic distribution across the determined principal axes. Overall, the PCA biplot suggests that the benign and malignant breast samples were highly similar in characteristics.

Feedforward neural network designs
The NN input layer consisted of 462 nodes which corresponded to the defined IR absorbance of each sample in frequencies between 1800 cm -1 to 850 cm -1 . Three (3) FNN models were designed with varying layer sizes (n = 2, n = 4, n = 8). The quantity of neurons per FNN hidden layer was kept constant for each repetition. The general NN architectures are summarized in Table 2.
Gaussian random initialization was assumed for weight initialization, while a zero-value initialization was used for the biases. Moreover, SELU activation was used for all neuron activation functions except for the respective output layers in which the softmax activation function was used. All NN were trained through backpropagation via AdaGrad stochastic gradient decent (SGD) [47] over 1000 epochs. The binary cross-entropy function was considered as the cost function for all NN designs during the training process. To avoid over-fitting, a dropout of 90% was used for each feed forward hidden layers as recommended for SELU activation [29,37].

Neural network optimization
Pre-training, a grid search was performed to determine the optimal learning rate and the layer width. To limit the search space of the performed grid search, the explored learning rates L,    (Fig 3) shows that for all models, the designs peaked at a learning rate of about 0.005 to 0.1, with the FNN2 having the most stable deviation of performance across its range and the FNN8 as the most unstable. Regardless of the model, it can further be seen that at learning rates above 0.1, the models exhibited very poor performance in validation accuracy which may be due to divergence and large weight oscillations. Meanwhile, the validation accuracy stagnated at learning rates of below 0.005, which may be due to very small changes in the NN's weights and biases. The optimized learning rate for each model was identified to be all equal to 0.01, while the number of neurons per layer of each model to achieve best performance were 350, 400, and 300 for FNN2, FNN4, and FNN8, respectively.

Diagnostic performance of models
Diagnostic performance of the NN models and the other machine learning models (NB, DT, RF, LR, LDA, SVM) was determined by comparison with the gold standard, which is the microscopic examination of H&E-stained tissues by pathologists. In terms of ACC, NPV and RR, all the NN models were able to significantly surpass all the other machine learning models (Tables 3 and 4). However, none of the models was able to outperform the SVM model as to AUC (95.72% ± 4.94%). Among the NN models, the best AUC was achieved by FNN4 (90.48% ± 10.30%) followed by FNN8 (90.35% ± 10.10%) then FNN2 (90.05% ± 9.95%). The relatively low AUC among NN models may be attributed to a lack of training time. In terms of PPV, the NN model performances did not significantly differ from the best benchmark model, which was SVM. As to NPV, the NN models were able to surpass the performance of the best benchmark model by an average of 4.42% for FNN2, 3.16% for FNN4 and 2.08% for FNN8. As to SR, the models performed significantly less than SVM, but were able to outperform the other benchmark models. The models were able to outperform SVM in terms of RR by an average of 2.94% for FNN2, 1.23% for FNN4 and 0.17% for FNN8. Note, however, that the RR value for FNN8 was not significantly differed from that of the SVM. The observed increase in metric performance from FNN8 to FNN2 may be due to a lack in training time for the deeper models. Overall, the significantly high ACC value of the FNN models may prove them viable classifiers in distinguishing malignant from benign samples using FTIR data. The results are summarized in Tables 3 and 4.

Visual peak analysis of data
The fingerprint spectral region (1800 cm -1 to 850 cm -1 ) showed that the absorbance spectra of the malignant and benign tissues were significantly distinct from each other, specifically at bands 1452cm -1 /1452 cm -1 , 1399 cm -1 /1401 cm -1 , 1337 cm -1 /1337 cm -1 , 1279 cm -1 /1279 cm -1 , 1236 cm -1 /1236 cm -1 , which represent lipids, DNA, RNA and phospholipids ( Table 2). Other peaks tested, in particular 1632cm -1 /1634 cm -1 , 1539cm -1 /1540 cm -1 , 1160cm -1 /1160 cm -1 , 1032cm -1 /1030 cm -1 , and 880cm -1 /878 cm -1 , were found as non-distinct among samples. These bands represent amide I proteins, amide II proteins, carbohydrates, glycogen, and phosphorylated proteins. It is worth mentioning that the tissues analyzed had uniform thickness (5μm) achieved by sectioning with a microtome. The median absorbance spectra of benign and malignant breast tissue samples are shown in Fig 4. Test of homogeneity showed that the characteristic IR absorbance peaks of the malignant cases were significantly different (p<0.05) from the benign samples. The visually identified peak positions and absorbances in the fingerprint IR region (1800 cm -1 to 850 cm -1 ) that could significantly differentiate the malignant from benign samples are summarized in Table 5. Their corresponding functional group, vibrational mode, and molecular source assignments

PLOS ONE
are also listed in the aforementioned table. It was observed that peak absorbances representing lipids, DNA, RNA and phospholipids were significantly decreased in most malignant tissues.
The results of the performed test of significance is further backed-up in Fig 2, where wavenumbers associated with significant peak absorbances were more closely projected to F 1 than those which were not. This implies that lipids, DNA, RNA and phospholipids are responsible for the high variability among the samples, suggesting further that these biomolecules highly varies between malignant and benign classes.

Significant peaks identified by neural networks
The input response produced by each NN design is summarized in Fig 5. As shown, the response of each network is highly similar to one another. For all NN designs, the IR region associated with the functional groups, in particular within~960 cm -1 to~1050 cm -1 , showed the greatest input response within the considered spectrum. Meanwhile, the least response was evident within~1250 cm -1 to~1320 cm -1 .
Using the input response of NN from neural network committee-based sensitivity analysis, the significant peaks were identified from the fingerprint spectral region (1800 cm -1 to 850 cm -1 ) as shown in Fig 5. The information on the protein content, including its secondary structures such as amide I and amide II, are observed in the region between 1800 cm -1 to 1500 cm -1 [48,49]. The bands found at~1635 cm -1 are associated with the amide I protein that arises from the C = O stretching vibrations of the amide groups of the protein backbone [49,50]. The bands at 1540 cm -1 results to N-H bending in the amide II groups, which is associated with aromatic  amino acids. The bands 1454 cm -1 and 1393 cm -1 are associated with the CH 2, CH 3 deformation modes mainly from proteins and lipids [49,51]. Peak position 1393 cm -1 resulted from COO −symmetric stretching of amino acids [49]. The region of 1300 cm -1 -800 cm -1 corresponds to the variations of functional groups that are present in proteins, nucleic acids, carbohydrates and phospholipids such as PO 2 -, C-O, and C-C [48,52]. The band at 1238 cm -1 is at the range of sugar-phosphate chain vibrations which is related to PO 2 − asymmetric stretching of nucleic acids [52]. Furthermore, this is also the range for Amide III (1299 cm -1 -1200 cm -1 ) for its C-N stretching, N-H bending, C = O stretching, and O = C-N bending [53,54]. The vibrations of C-O group from glycogen are observed in peak 1077 cm -1 [51]. The band 1030 cm -1 -1045 cm -1 is due to C-O stretching and C-O bending of the C-OH groups of carbohydrates such as glucose, fructose [50,54]. The band at 990 cm -1 is related to phosphorylation of proteins and ribose-phosphate chain [50] while~962 cm -1 is associated with symmetric phosphate stretching modes from phosphate diester groups in nucleic acids and phospholipids [51].

Discussion
FTIR spectroscopy is a prospective novel diagnostic tool that is used to distinguish cancer samples from normal ones at high sensitivity, specificity, and accuracy [55,56]. Considering the molecular complexity of biological specimens, chemometric techniques such as the principal component analysis (PCA) and artificial neural networks (ANN) that combine statistical and mathematical algorithms are utilized to generate chemo-physical evidence from spectral data [55]. The advent of computers with enhanced processing capabilities and enhanced memory capacity have led in the rise of computer-aided diagnosis (CAD), which combines algorithms or methods from pattern recognition and digital image processing [56]. Meanwhile, scientists have been drawn to the potential application of FTIR spectroscopy in the clinical setting to improve accuracy and reproducibility of cancer diagnosis, while omitting the need for complex and time-consuming clinical processing of tissue biopsy samples [55]. This is best exemplified by the study of Großerueschkamp, et al., wherein they combined FTIR imaging and a novel trained random forest (RF) classifier for the automated marker-free histopathological annotation of lung tumor classes and subtypes of adenocarcinoma without further treatment of tissue samples. This study yielded greater reproducibility and accuracy of 97% for the annotation of lung tumor classes and 95% for the identification of prognostic adenocarcinoma subtypes [57]. Subsequently, FTIR reduced intra-and inter-operator variability through its objectivity, reproducibility, and improved accuracy over current methodologies for cancer diagnosis [55]. This also permits the standardization of spectral measurement and analysis, which is necessary for the construction of FTIR spectral databases with highly specific spectroscopic markers for the various stages and grades of different cancer types applicable to the clinical settings [58]. Additionally, an easy and objective data interpretation can be done by non-spectroscopists by incorporating powerful algorithms for automatic data analysis of large data sets [55]. The designed NNs exhibited superior accuracy (>90%) relative to the best benchmark model (SVM). These metrics prove them not only as excellent classifiers in distinguishing malignant from benign breast cancers using ATR-FTIR data, but also excellent classifiers in general. Overall, the FNN2 model was able to obtain larger metric values relative to the other NN models. The decrease in the performance metrics, in particular, accuracy, NPV, and RR of the models as a function of the layer quantity makes it evident that the deeper models may have lacked training time. Do note, however, that an opposite trend is evident for the SR and PPV metrics, implying that the designed architectures may approach a classifier that becomes increasingly more accurate in detecting malignant rather than benign samples as the model gets deeper. This observation presents a trade-off between the model's capability in confirming truly malignant from benign samples. This, further, implies that if a more accurate positive screening test is more in-demand, then deeper models may be assumed. Conversely, for more accurate negative screening tests, a less deep model may be more necessary.
Considering the metric comparison performed, the non-significant PPV metric of the designed NNs make them equally competent to SVM. However, the significantly higher NPV metric of the designed NN models make them more superior classifiers in terms of identifying benign samples as truly exhibiting non-malignancy. The significance of the SR and the RR are parallel to the significance of the PPV and NPV by definition, respectively. This characteristic makes the designed NN models more practical to use in situations where diagnosing non- malignant patients as malignant becomes very costly. While administering an incorrect diagnosis is very detrimental for a patient in general, for financially non-capable individuals, an accurate diagnosis of non-malignancy may be of more importance since a false diagnosis of malignancy risks the individual of financial burden in chemotherapy, and a probable decline in health which further necessitates added costs. In developing countries such as the Philippines, the use of highly specific diagnostic tool such as the designed NNs in this study may prove more beneficial for patients undergoing cancer diagnosis. Regardless of the use, however, the models show their potential as highly specific tools to assist pathologists and medical practitioners in the field.
The designed neural networks that were used to analyze the FTIR spectra were able to identify significantly decreased peak absorbances characteristic of lipids, nucleic acids, and phospholipids in malignant tissues, which were similarly evident in the performed visual analysis ( Table 5). Breast cancer is often characterized by the stimulated production of novo lipids which are essential for cell growth, proliferation, and oxidative stress resistance. The triacylglyceride storage in lipid droplets has been suggested to work as fuel source after re-oxygenation during intermittent hypoxia, whereas fatty acids promote redox balance supporting a high-glycolytic rate in malignant tissues. Lipids also form the structural basis of paracrine hormones and growth factors which stimulate tumor growth, neovascularization, invasion, and metastatic spread [59]. The decrease in lipids and phospholipids may reflect the utilization of these biomolecules for nutrition and energy source; and thus, prevent there accumulation in cancer cells during cancer progression [60]. A significant difference in the absorptive peak was also apparent in the DNA/RNA spectral region, with the malignant breast samples showing significantly lower peak absorbance than benign samples. This is in contrary to the findings of Lazaro-Pacheco et al. wherein higher contribution of nucleic acid bands was identified in cancerous samples in comparison with normal breast tissues in different spectral regions. They argued that high concentration of these biomolecules is expected since there is increased cellular content in response to an abnormal proliferation [61]. However, in a study involving ovarian cancer, the RNA/ DNA absorption peaks were significantly lower in malignant tissues than in borderline and benign ovarian tissues [62]. The lowered peak absorbance among malignant samples may be due to fragment transfer of tumor DNA or cell-free RNA from the cancerous area to the bloodstream; thus, consequently decreasing the nucleic acid content at the primary tumor area [63].
Interestingly, there was no distinctive difference between malignant and benign breast tissues in the absorptive peaks of carbohydrates as perceived by the visual analysis performed (Table 5). In relation, the P1160 wavenumber vector that is associated with carbohydrates was projected significantly far from F 1 , implying further that this biomolecule is not a possible cause of variability (Fig 2). However, the sensitivity analysis stated otherwise since the highest peak was evident across the IR region associated with the C-OH group of carbohydrates ( Fig  5). Studies have shown that assessing glycogen levels is a good differentiation marker between malignant and benign tissues, with malignant samples generally consuming more glycogen to sustain survival during prolonged hypoxia and glucose deprivation as well as to sustain metastasis [64]. This ability of the sensitivity analysis to recognize the carbohydrates as differentiating factor, which in contrast was not detected by mere visual peak analysis, further proves the proposed method as a more discerning method to identify important spectral biomarkers. Possibly, the breast cancer cells could have already catabolized their glycogen stores as well as their subsequent by-products such as glucose for survival in nutrient-deprived environment [23]; hence, became relatively indistinguishable by the usual visual peak analysis. Given the proposed method's superior ability, the study suggests that the identified peaks within the higher IR wavelength region (~1400 cm -1 to~1800 cm -1 ) be given attention, particularly those associated with CH 2, CH 3 deformation modes and amide protein stretch and bends.
A variety of biological materials including blood, solid tissues, urine, and sputum have been studied using FTIR spectroscopy to develop better alternatives for cancer diagnosis and management. In the clinical setting, blood and tissue samples remain to be widely used as opposed to other specimens for diagnosing disease [55]. In less developed countries, the use of FFPE samples can provide technical ease and economic advantage for longitudinal tissue specimen storage as they can be easily retrievable from accredited repositories for further analysis [65]. Compared to immunohistochemical and molecular assays, FTIR can be a cheaper alternative for detecting biochemical markers in pathologic FFPE specimens based on unique vibrational patterns [60]. With the introduction of machine learning, FTIR spectroscopy in clinical diagnostic settings can reduce intra-and inter-operator variability and improve accuracy and reproducibility of cancer diagnosis, while omitting the need for complex and time-consuming clinical processing of clinical samples [56].
The generation of NN models from the FTIR fingerprint of benign and malignant FFPE breast tissues led to the identification of significant wavenumbers apart from those at peak absorbances, which can be used to discriminate malignant from benign tissues. Interestingly, unique peak absorbances distinctive of lipids, nucleic acids, and phospholipids were identified, showing that these biomolecules were significantly decreased in malignant tissues as compared to benign samples, and can, therefore, be used as biochemical fingerprints to aid in cancer diagnosis.
While the current study shows that NN models from FTIR spectra can be used as an adjunct tool for diagnosing breast cancer, additional clinical studies should be made to bring this technology into the clinical setting. Due to financial constraint, this study was conducted using only the basic type of FTIR spectrometer with limited spatial resolution. To acquire a comprehensive spectral data, additional FFPE samples may be analyzed using an FTIR coupled with an infrared microscope to detect vibrational motions of molecules within very restricted regions. Other spectroscopic techniques such as the Raman spectroscopy can also be used to further probe molecular vibrations to aid in the characterization and discrimination of tissue types [66]. The creation of spectral database and the generation of novel powerful algorithms for automatic data analysis of large data sets is another prospect to accelerate point-of-care decisions and improve therapeutic management for breast cancer patients [67]. Further studies also show that an alternative sample to tissues could be blood plasma, since the use of plasma is cheaper, less invasive, and easier to process [68,69]. Through the integration of AI and FTIR, spectral biomarkers in plasma samples may be identified to monitor treatment response; a study which is already being investigated by the research team.
In summary, the present study generated NN models that led to the identification of unique infrared spectrum of absorption in the lipid, nucleic acid, phospholipid, and carbohydrates regions that could effectively discriminate malignant from benign breast tissues. To the researchers' knowledge, this is the first study to have used several machine learning tools to identify malignant breast tissues based on FTIR spectral data.