Albumin and hemoglobin adducts of estrogen quinone as biomarkers for early detection of breast cancer

Cumulative estrogen concentration is an important determinant of the risk of developing breast cancer. Estrogen carcinogenesis is attributed to the combination of receptor-driven mitogenesis and DNA damage induced by quinonoid metabolites of estrogen. The present study was focused on developing an improved breast cancer prediction model using estrogen quinone-protein adduct concentrations. Blood samples from 152 breast cancer patients and 71 healthy women were collected, and albumin (Alb) and hemoglobin (Hb) adducts of estrogen-3,4-quinone and estrogen-2,3-quinone were extracted and evaluated as potential biomarkers of breast cancer. A multilayer perceptron (MLP) was used as the predictor model and the resultant prediction of breast cancer was more accurate than other existing detection methods. A MLP using the logarithm of the concentrations of the estrogen quinone-derived adducts (four input nodes, 10 hidden nodes, and one output node) was used to predict breast cancer risk with accuracy close to 100% and area under curve (AUC) close to one. The AUC value of one showed that both data sets were separable. We conclude that Alb and Hb adducts of estrogen quinones are promising biomarkers for the early detection of breast cancer.


Introduction
For more than half a century, early detection of breast cancer has been an important issue in cancer research. Pioneered by Egan [1] and then advanced by Wolberg & Mangasarian [2,3], mammography is now a de facto technique for diagnosing the development of breast cancer a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and identifying the locations where a biopsy should be conducted. However, this technique relies heavily on visual inspection or computer aided pattern recognition techniques in order to recognize cancer cells. If cancer cells have not yet developed, diagnosis relies on biomarkers such as single nucleotide polymorphisms (SNPs) [4,5], gene expression profiles [6], estrogen/ progesterone receptors [7], DNA adducts [8], serum proteins [9,10], and albumin and hemoglobin adducts of estrogen quinone [11,12].
In Taiwan, the onset of breast cancer tends to occur at a younger age than in western countries [30]. More than 50% of women diagnosed with breast cancers in Taiwan were found to be premenopausal [31] in contrast to approximately 25% of those in western populations. The incidence rate of breast cancer in Taiwanese women born after the 1960s is shifting toward that in Caucasian Americans. Environmental and dietary risk factors have been implicated in contributing to the increase in breast cancer in Taiwanese women. Development of biomarkers to identify individuals at high risk of developing breast cancer is a necessity. In our previous investigation [11,12], we demonstrated that there are elevated levels of both the Alb and Hb adducts of E 2 -2,3-Q and E 2 -3,4-Q in breast cancer patients compared with those in healthy women. One of the unique features of this finding is that the ratio of Alb-E 2 -3,4-Q adducts to Alb-E 2 -2,3-Q adducts was 2:1 in breast cancer patients but 1:2 in healthy women, whereas the ratio of Hb-E 2 -3,4-Q adducts to Hb-E 2 -2,3-Q adducts in both breast cancer patients and healthy women was 2:1.
These recent findings suggest that Alb and Hb adducts of estrogen quinone could be used for early detection of breast cancer. A decision model could be developed based upon the concentration of the estrogen-quinone adducts. Visual inspection of scatter plots of the log-concentration values of E 2 -3,4-Q adducts versus E 2 -2,3-Q adducts of both Alb and Hb showed that the decision boundary of the model is unlikely to be a hyperplane. This suggested that a linear decision model such as logistic regression is not suitable for the decision model; instead, a non-linear decision model such as a multilayer perceptron would be more appropriate. In this report, we describe how a multilayer perceptron (MLP) with four input nodes, one hidden layer, and one output node could be applied to develop a decision model for breast cancer detection.

Data
The study population was recruited in a suburban medical center in central Taiwan. Women with breast cancer and healthy female subjects were recruited between May 2009 and May 2012. All the subjects provided sufficient venous blood for protein adduct analyses and completed questionnaires regarding age, body mass index, occupation, disease history, cigarette smoking, alcohol consumption, and dietary habits. Of those recruited, 152 breast cancer patients (BCP) and 71 healthy controls (HC) without any history of cancer were enrolled in the study. None of the enrolled individuals had a history of cancer, alcohol use, smoking, or chemotherapy. The age range of the BCP group was from 16 to 79 and the age range of the HC group was from 23 to 69. Mean age was 39.3 for HC and 50.8 for BCP. Of those recruited, 84 BCP and 58 HC were premenopausal. The Institutional Review Board of the Changhua Christian Hospital reviewed and approved this study (CCH IRB No. 081219). Each subject provided her written informed consent before participating in the study.
For each subject, the Alb adducts of E 2 -2,3-Q and E 2 -3,4-Q were analyzed from the serum following the procedure outlined in [11] and the Hb adducts of E 2 -2,3-Q and E 2 -3,4-Q were extracted from the red blood cells following the procedure outlined in [12]. All cysteinyl adducts arising from estrogen quinones were assayed using the procedure described previously [11]. Briefly, after bringing protein samples to complete dryness, estrogen quinone-derived adducts were cleaved after reaction with trifluoroacetic acid and methane sulfonic acid and analyzed via gas chromatography and mass spectrometry (S1 Data). First, models using Alb adducts or Hb adducts alone were unable to achieve 100% accuracy because both groups of data overlapped. Second, the concentration of Alb adducts of E 2 -3,4-Q was higher than that of E 2 -2,3-Q adducts in cancer patients with a ratio of approximately 2:1, whereas a ratio of 0.5 was observed in healthy controls, consistent with the finding in [31]. On the other hand, the levels of Hb adducts of E 2 -3,4-Q were higher than those of E 2 -2,3-Q adducts in both cancer patients and controls, with a ratio of approximately 2:1.

Model
To construct the MLP model, we assumed that the training set D ¼ fðx k ; y k Þg 223 k¼1 was the set of blood samples obtained from the 223 subjects. Here, Breast cancer risk-related protein adducts sample input and Y k 2 f0; 1g was the diagnostic result. If the k th subject had cancer, To model the risk function, we applied a MLP with four input nodes, 10 hidden nodes, and one output node. The inputs to the MLP were the logarithm values of the following adduct concentrations.
x k1 ¼ logðConc:of Hb adducts of E 2 À 3; 4À QÞ: x k2 ¼ logðConc:of Hb adducts of E 2 À 2; 3À QÞ: x k3 ¼ logðConc:of Alb adducts of E 2 À 3; 4À QÞ: x k4 ¼ logðConc:of Alb adducts of E 2 À 2; 3À QÞ: be the parametric vector, where d 2 R 10 and c 0 2 R are the weight vector and bias associated with the output node, respectively, and a j 2 R 4 and c j are the weight vector and bias associated with the j th hidden node, respectively. The output of the MLP, f (x k ,w), with input x k is thus given by To obtain the parametric vector w, we applied the gradient descent, minimizing the objective function given by V w ð Þ ¼ 1 223 where the summation term is the mean squared error and the last term is the weight decay. The weight vector w is thus updated recursively by the following equation.
where μ is the step size and rwV (w) is the gradient vector. In our study, α = 0.0004, μ = 0.02, and the total number of updates in (3) was 50000. After the training was completed, the MLP was used to classify the samples. Let y predict k be the class label of the k th sample.
where t 2 [0, 1] is called the decision threshold. The k th subject was classified as a cancer patient if y predict k = 1. Otherwise, the k th subject was classified as healthy. The training error rate was defined as the total misclassification over the size of training samples. Clearly, with different values of t, the error rate will be different. Usually, researchers arbitrarily set this value to 0.5.
In our analysis, we set the value of t in a different way. To determine the value of t, we attempted all the possible values of t from 0, 0.01, and so on, to 1. For each value of t, the training error rate was recorded. The value of t was then set to the one with the minimum training error rate and denoted as t opt . As shown in the analysis, there could have been a range of values that gave the same minimum training error rate if the MLP gave the same minimum training error rate, t opt = (t min + t max ) /2 for all t 2 [tmin, tmax]. Fig 2 shows the results of a typical training that used all four adducts of estrogen quinone. The results shown in the top panels indicate that the mean square error (MSE), the error rate (i.e. misclassification rate), and the parameters converged. Both the MSE and the error rate converged to zero. The right bottom panel shows that changing the threshold value t from 0 to 1 resulted in an area under curve (AUC) value of one. This indicated that the two sets of data were indeed separable.

Results
Another four MLPs, each consisting of different combinations of three adducts, were generated based on the same procedure as for the MLP that used four adducts. The AUC analytical results are depicted in Table 1. For reference, the result of using all four adducts is included as Case 1. The results of Case 1, Case 2, and Case 5 revealed that HB adducts of E 2 -2,3-Q and Alb adducts of E 2 -3,4-Q are potential biomarkers for breast cancer detection. We also generated scatter plots of the logarithm values of the adduct concentrations; these demonstrated that while the two sets of data were separable, their set boundaries were very close to each other. Thus, MLPs generated using these two adducts may be sensitive to data error.
To validate the models, cross-validation was applied. Samples of BCP and HC were both randomly partitioned into two sets. The training set consisted of 80 percent of the samples (121 BCP samples and 56 HC samples) and the testing set consisted of the remaining samples. The MLP was trained using only the training set.
The AUC was obtained by changing the value of t from 0 to 1. The error rate and AUC were analyzed for both the training and testing sets. The process was repeated 20 times in all five cases as depicted in Table 1. Each time, a new training set was randomly generated. Five MLPs were obtained for the corresponding cases. For each MLP, the threshold value t opt was obtained via the method described earlier. The average MSE, the average AUC, and their standard deviation values (shown inside parentheses) are shown in Table 2. These results showed that using the four adducts as biomarkers yielded superior accuracy in breast cancer detection compared with the results obtained using other biomarkers (Table 3).

Discussion
The high incidence rate of breast cancer in Taiwanese women emphasizes the need for better and more suitable screening and diagnostic technologies. In addition to the utility of mammography screening for early detection of breast cancer [1][2][3], recent studies have revealed the potential application of serum and plasma protein-based screening assays for diseases including prostate cancer, ovarian cancer, and breast cancer [10,35,36].
In this study, we aimed to develop a screening method with high sensitivity, specificity, and positive-predictive value for detecting breast cancer using blood protein adducts of estrogen quinones. Using MLPs, we were able to predict breast cancer risk based on the natural logarithm values of the estrogen quinone-protein adduct concentrations with an accuracy close to 100% and an AUC value close to one. The prediction results obtained using MLP with estrogen quinone-protein adducts were more accurate than those of other models [5,6,9,10,33]. In addition to the superior accuracy of our model compared with previously reported results of breast cancer prediction, the AUC value of one we obtained revealed that both data sets (cancer patients and healthy controls) were separable. Our findings strongly support the use of Alb and Hb adducts of estrogen quinone as biomarkers for early detection of breast cancer. These biomarkers can supplement the mammographic method in cases where cancer cells cannot yet be observed. However, the method presented in this investigation was developed using a retrospective study design. A prospective study using the above estrogen quinone-derived protein adducts would help validate these as biomarkers for early detection of breast cancer. Taken together, this evidence lends further support to the idea that the cumulative concentration of estrogen quinone-protein adducts is a significant predictor of the risk of developing breast cancer. Further, the methodology developed in this study may also be applicable to other epidemiological studies and clinical trials in the prevention and early detection of breast cancer.