Wavelength Optimization for Quantitative Spectral Imaging of Breast Tumor Margins

A wavelength selection method that combines an inverse Monte Carlo model of reflectance and a genetic algorithm for global optimization was developed for the application of spectral imaging of breast tumor margins. The selection of wavelengths impacts system design in cost, size, and accuracy of tissue quantitation. The minimum number of wavelengths required for the accurate quantitation of tissue optical properties is 8, with diminishing gains for additional wavelengths. The resulting wavelength choices for the specific probe geometry used for the breast tumor margin spectral imaging application were tested in an independent pathology-confirmed ex vivo breast tissue data set and in tissue-mimicking phantoms. In breast tissue, the optical endpoints (hemoglobin, β-carotene, and scattering) that provide the contrast between normal and malignant tissue specimens are extracted with the optimized 8-wavelength set with <9% error compared to the full spectrum (450–600 nm). A multi-absorber liquid phantom study was also performed to show the improved extraction accuracy with optimization and without optimization. This technique for selecting wavelengths can be used for designing spectral imaging systems for other clinical applications.


Introduction
A wavelength optimization strategy is developed to improve the design of a novel spectral imaging probe array [1] for quantitative assessment of breast tissue margins during partial mastectomy surgery, a common treatment for early stage breast cancer [2,3]. This generalized method is based on a search heuristic known as a genetic algorithm that mimics the process of natural evolution and identifies reduced wavelength sets that maintain tissue optical contrast when compared to the broadband data. It requires a technique for measuring or simulating spectral data with known optical contrast and a metric for data extraction quality. Diffuse reflectance spectroscopy in the visible range can be used to nondestructively measure tissue optical properties. The propagation of multiply scattered photons is sensitive to the absorption by biological molecules and can ultimately provide contrast between adipose tissue content (b-carotene absorption), vascularity (hemoglobin absorption) and scattering (fibroglandular content). During partial mastectomy, the surgeon strives to excise the entire tumor with a surrounding rim (or ''margin'') of normal tissue while preserving as much normal tissue as possible in the breast. Ultimately, the complete removal of the breast tumor is vital to reducing the chance of tumor recurrence [4]. A previously developed spectral imaging system for breast tumors consists of a broadband illumination source, an 8-channel fiber optic conduit to direct light to and from the tissue, and an imaging spectrograph and cooled CCD for detection. Each placement of the imaging probe allowed for diffuse reflectance spectra (450-600 nm) to be measured from up to 8 sites on the margin. Multiple placements of the probe allowed for mapping the entire margin surface. Optical properties of the specimens were extracted to create tissue composition maps of total hemoglobin concentration, b-carotene concentration, and tissue scattering using a fast, scalable Monte Carlo model of reflectance previously developed by our group [5,6]. Pathologically-confirmed positive margins, showed significantly lower b-carotene/scattering ratios compared to negative margins. This finding reflects a decrease in fat content and an increase in fibroglandular content associated with margin positivity. The sensitivity and specificity of the system for determining margin status was 79% and 67%, respectively [7].
The clinical adaptability of this technology will be impacted by its size, cost, and the time needed to diagnostically map tumor margins. This motivated the design of a compact and cost-effective device based on the utilization of a few discrete wavelengths for illumination to replace a broadband source and monochromator in the original system and inexpensive photodiode arrays for detection in lieu of a spectrograph and CCD camera [8][9][10].
Optimized selection of wavelengths and bandpass filters was important to minimize complexity and acquisition time, while maintaining comparable sensitivity to the relevant sources of optical contrast in the breast. Several groups have reported on the optimization of wavelength combinations for specific clinical applications. Using a matrix decomposition of basis spectra and simulation of tissue data, Mazhar et al. optimized wavelength pairs to measure hemodynamic changes in the near-infrared range for breast imaging applications with diffuse optical tomography [11]. By solving a linear equation based on a modified Beer-Lambert Law, Umeyama and Yamada accounted for cross-talk of measured NIR chromophores in wavelength combinations for studying the brain [12]. Ferreira et al. presented a device fabrication driven strategy for the spectroscopic imaging of esophageal tissue, featuring 16 discrete wavelengths in the 350-750 nm range [13]. The selection of wavelengths was constrained by the filter fabrication process, i.e. materials, number of layers, FWHM, etc. Phelps et al. developed a ratiometric method that involves the selection of wavelength pairs that are independent of tissue scattering to rapidly estimate total hemoglobin concentration in the UV-visible range [14]. These previous studies show the importance of optimizing wavelength selection for various clinical applications. Although diffuse reflectance and elastic scattering spectroscopy [15] have increasingly been applied to breast tumor margin assessment, few have optimized system production for clinical translation. In this manuscript, a strategy is presented for optimizing wavelength selection for a cost-effective design of spectral devices for tissue margin assessment.  Methods Figure 1 provides a broad overview of the steps taken to determine and test the best wavelength sets and spectral bandpass. First, existing ex vivo breast tissue reflectance data was used as a training set for the optimization algorithm. A fast, scalable Monte Carlo reflectance model [5] was used to extract the tissue optical properties from the training set using various constraints, such as the total number of wavelengths, the range of wavelengths, and the increment of each wavelength from which to select. Combined with a genetic algorithm for global optimization, the best wavelength sets are identified by minimizing the sum of tissue property extraction errors from the reflectance spectra. The selected wavelengths are then validated with an independent pathology-confirmed ex vivo breast tissue data set. A tissuemimicking phantom experiment was also performed as part of the wavelength selection validation. The text in these subsequent sections describes the methods in greater detail.

Wavelength Optimization
1.1 Diffuse reflectance spectra from ex vivo breast tissue specimens. The dominant absorbers in the visible spectrum in breast tissue are oxy-and deoxy-hemoglobin and b-carotene. The absorption spectra of these breast tissue components are shown in Figure 2. Previous studies have shown that b-carotene and tissue scattering are significant parameters that can be used to differentiate between malignant and benign breast tissues [2,16,17]. To select the minimum set of wavelengths in the visible spectral range that are sensitive to these key tissue constituents, an existing data set of 4953 diffuse reflectance spectra measured from an ex vivo clinical study at Duke University Medical Center approved by the Duke University Institutional Review Board (protocol #00017428) involving partial mastectomy procedures on 100 patients was used as a training set for the wavelength optimization [7,17]. The diagnosis for the 101 margins were as follows: 44 negative (.2 mm normal tissue), 35 close (,2 mm normal tissue), and 22 positive. In addition, routine histopathology was performed on a subset of these measurements, and the study pathology classified 6-10 randomly selected measurement locations (or ''sites'' on the margins). A total of 320 normal adipose sites, 24 normal fibroglandular sites, and 38 malignant sites were used for testing the optimized wavelength sets.
Total hemoglobin [THb], b-carotene [bc], and reduced scattering coefficients ,m s '. for each of the 4953 diffuse reflectance spectra were extracted using a previously developed inverse Monte Carlo model of reflectance [5,6]. Reflectance spectra of samples at the 10 th , 25 th , 50 th , 75 th , and 90 th percentile of the empirical cumulative distribution functions (cdf) of [THb], [bc], and average ,m s '. were chosen resulting in a total of 15 reflectance spectra in the training set. This method ensured that the data are sampled evenly over the distributions rather than the parameter value ranges, which could result in oversampling of samples at the periphery of the distributions. Table 1 lists the extracted breast tissue properties for each of the 15 selected reflectance spectra for the wavelength optimization training set. Samples 1-5, 6-10, and 11-15 represent the 10 th through 90 th percentiles of [THb], [bc], and average ,m s '., respectively. The objective in selecting reflectance spectra based on the cdfs extracted from an extensively large data set of previously measured ex vivo breast tumor margins was to cover a wide range of [THb] (10.7-97.9 mM), [bc] (7.0-37.6 mM), and average ,m s '. over 450-600 nm (3.7-11.9 cm 21 ). Thus, from the 15 reflectance spectra, each with 3 extracted parameters to compare, there are 45 individual parameters to compare between the reduced wavelength spectrum extractions and the full spectrum extractions.

Combined monte carlo reflectance model and genetic
algorithm to select center wavelengths. The 15 selected reflectance spectra described from the previous section were used in a wavelength optimization technique that combines our previously developed inverse Monte Carlo model of reflectance [5] with a genetic algorithm (GA) (Global Optimization Toolbox in MATLAB, The MathWorks, Natick, MA). Briefly, the GA uses the principles of natural selection and evolution to produce different solutions for a given problem. For our application, the GA is an appropriate optimization method because it can solve every optimization problem that can be described with chromosome encoding, which is similar to various wavelength combinations. It can also provide multiple solutions for a given problem, which is necessary from a practical system design perspective if not all wavelengths in the solution are available commercially. The algorithm has two major components: (1) the population of individuals (or possible solutions) with its own unique string of ''chromosomes'' and (2) a fitness function that evaluates the possible solutions. Typically, a population of solutions is randomly generated for a given range of possible solutions. The fitness function is used to evaluate each individual from that population. All of the individuals from the populations are then ranked according to their fitness values. From this existing population, a user-identified proportion is selected to breed a new generation of solutions, and those solutions with fitter values are more likely to be selected. The parent solutions reproduce new offspring solutions by genetic operators such as crossovers or mutations, which essentially results in changes of chromosomes in the offspring and maintains genetic diversity in the subsequent populations. The GA ends when a solution that satisfies the criteria is found, a designated computational time is reached, or a specified generation number is reached. Figure 3 is a general diagram of the steps taken for wavelength optimization, combining an inverse Monte Carlo reflectance model with the GA.
The algorithm begins with the initial constraints of the wavelengths to be used. To extract the 4 parameters of oxy- hemoglobin, deoxy-hemoglobin, b-carotene, and reduced scattering coefficients, at least 5 center wavelengths are needed. The initial population of wavelength sets is created by a random permutation of 50 different wavelength combinations, for 5, 8, and 12 wavelengths in 1, 5, and 10 nm increments from 450-600 nm (151, 31, and 16 possible center wavelengths, respectively). The selection of wavelengths in 1 nm increments represents an ideal situation in system design in which the types of sources available are not limited. The selection of wavelengths in 5 and 10 nm increments represents a more realistic situation, in which the final optimized set of wavelengths will likely be commercially available in the form of bandpass filters. These different wavelength combinations serve as the initial solutions of the iterative GA. The inverse MC model serves as the GA's fitness function, which is used to evaluate the suitability of each set of wavelengths as a possible solution for extracting ,m a ., thus [THb] and [bc], and ,m s '. from the training set.
In the 1st generation of a given GA process with its constraint of total wavelengths and the selection increment, the output of the fitness function is 50 sets of extracted breast tissue properties, [THb], [bc], and average ,m s '., for 15 samples using each of the 50 reduced wavelength solution sets. The fitness value was the RMS error between the extracted tissue parameters ([THb], [bc], and average ,m s '.) using the reduced wavelength set and using the full 450-600 nm spectrum. Fifty individual wavelength sets were ranked by increasing fitness values (sum of extracted errors) for the 15-sample training set. From these 50, the top 15 wavelength sets with the lowest sum of extracted errors are duplicated to create a new generation of solutions. These same 15 wavelength sets were also used to generate 35 new wavelength sets by means of single-point crossovers or wavelength mutations. In a crossover operation, a random wavelength serves as the point where two wavelength sets break and join. In a mutation operation, a new wavelength is randomly generated from a Gaussian distribution and replaces a wavelength of the parent wavelength set, creating a new wavelength set. The selection of the parent wavelength sets from the previous generation to crossover or to mutate and pass on to the next generation is based on the simulation of a roulette wheel, in which the area of the wheel corresponding to a parent is inversely proportional to the parent's fitness value, or sum of extracted errors. In other words, the lower the wavelength set's extracted errors, the higher the probability of that wavelength set is selected, crossed over or mutated, and  passed down to be part of the next 50 solutions to be evaluated. Because higher crossover fractions result in less diversity in the subsequent generations and we also found no significant differences in computational time or solutions for various crossover fractions ranging from 0-40%, in this particular study, the crossover fraction is set at 20%, which means 7 of the 35 solutions are the result of a crossover while the remaining 28 are the result of mutations. In the cases of any resulting offspring from a crossover operation having duplicate wavelengths, one of the duplicate wavelengths is discarded, and a new wavelength is randomly generated and inserted in the wavelength set. This process iterates until the minimum fitness value of the generation is unchanged for 10 generations or after 50 generations. All of the GA processes tested in this study converged to an optimum solution given their respective constraints prior to reaching 50 generations. A single optimization requires approximately 21-28 hours, depending on the initial constraints tested, such as the number of wavelengths implemented and the selection increment. The highest ranked 3 solutions from the final generation produced by each GA process were further evaluated using previously described clinical data, independent from the 15 spectra used in the selection process.
1.3 Selection of optimal bandpass. In addition to selecting the most appropriate total number and the center wavelengths of the source, it is also important to understand the effect of increasing full-width half-maximum (FWHM) on the accuracy of the extraction of optical properties. While laser diodes can have very small FWHM, it may not be possible to obtain sources at every wavelength in the optimized solutions. On the other hand, bandpass filters are commercially available at every 10 nm center wavelength in the UV-NIR spectrum, but may come at a cost of 10 nm FWHM around the center wavelength. Light emitting diodes (LEDs) often have even larger FWHM, commonly ranging from 20-50 nm.
Forward Monte Carlo simulations were conducted to study the effect of increasing bandpass. Using the wavelength-dependent optical properties of the 15 clinically measured spectra chosen for the training set described in Section 2.1, diffuse reflectance spectra were generated. The wavelength-dependent absorption coefficients, m a , were determined using the molar extinction coefficients for oxy-and deoxy-hemoglobin, as well as b-carotene. The reduced scattering coefficients, m s ', at each wavelength were calculated using Prahl's Mie scattering program [18]. The simulations were scaled for the probe geometry used in the clinical measurements [5]. Each of the 15 simulated spectra were convolved with Gaussian distributions of 1, 5, 10, 20, 30, 40, and 50 nm, resulting in a total of 105 spectra of varying FWHM.
The inverse Monte Carlo model was used to extract [THb], [bc], and average ,m s '. values from the 15 reflectance spectra with various FWHMs. The extractions were repeated for the top 3 solutions from the wavelength optimizations for 5, 8, and 12 total center wavelengths. The extracted parameters from the reduced wavelengths set with added FWHM were compared to those of the full, simulated spectra without added FWHM. Because these simulations did not include system and measurement artifacts that  may exist in measured clinical data, a 10% error was set as the threshold for determining an acceptable FWHM in the analysis.

Wavelength Selection Validation
2.1 Independent pathology-confirmed tissue data. The results from the wavelength optimization were tested against an existing breast tissue data set, independent of the 15 spectra used for the training set described previously. The inverse Monte Carlo model was used to extract [THb], [bc], and ,m s '. from each of the 382 pathological confirmed sites (320 adipose, 24 fibroglandular, 38 malignant) obtained from breast tumor margins. The tissue extractions were performed for the full spectrum of 450-600 nm in 2.5 nm increments for a total of 61 wavelengths, and for the top 3 optimized solutions for each of the test cases: 5, 8, and 12 total wavelengths in 1, 5, and 10 nm increments. To show the differences between optimization and non-optimization, tissue extractions were also made using the semi-evenly spaced wavelengths empirically chosen from 400-600 nm used in a previously reported system [19]. Using the full spectrum [THb], [bc], and ,m s '. extractions as the gold standard, errors in the extracted parameters resulting from the reduced wavelength sets were calculated. The Bland-Altman method was used to assess the agreement between the extractions using the full spectrum and the extractions using the optimized and non-optimized spectra.
2.2 Multi-absorber liquid phantom study. A set of 20 phantoms was used to further assess the results from the wavelength optimization. The liquid tissue-simulating phantoms were prepared by mixing polystyrene microspheres (07310, Polysciences, Inc) as the scatterer with water soluble hemoglobin (H0267, Sigma Co.) and crocin (17304, Fluka) as the absorbers. Hemoglobin and crocin were used as the absorbers since they have been used to simulate blood and b-carotene in breast tissue [20]. Based on the optical properties of b-carotene found in previous studies, the appropriate crocin level was added by matching the mean m a of these two absorbers with similar spectral features [2]. The 2 scattering levels represent the means over 450-600 nm of representative malignant (m s ' = 9 cm 21 ) and normal (m s ' = 12 cm 21 ) breast tissue. The optical properties of the phantoms are shown in Table 2.
The phantom optical measurements were obtained with a previously reported system with slight modifications [1,9] The system consists of a 450W Xenon Arc lamp and a scanning monochromator (Gemini 180, JY Horiba) coupled to a 600 mm optical fiber as the source. The spectral bandpass of the illumination was fixed at 7 nm. A custom annular silicon photodiode with 2.5 mm outer diameter and 0.75 mm inner diameter was used for detection [21]. The optical fiber was fitted through and epoxied in the detector aperture to illuminate the phantoms, and the detector was connected to a photodiode amplifier (PDA-850, Terahertz Technologies, Inc.) for reflectance measurements. Diffuse reflectance measurements were taken at the discrete wavelength solutions as well as at the evenly spaced wavelengths from 400-600 nm in order to compare the optimized solution to one which samples wavelengths at regularly spaced intervals over the visible spectral range as previously described [8].
The inverse Monte Carlo model was used to extract optical properties from each phantom and root mean square (RMS) errors were compared for both sets of wavelengths.

Eight Wavelengths can be Used to Accurately Extract
[THb], [bc], and ,m s '. Table 3 enumerates the top 3 solutions for each of the constraints in the optimization, including wavelength range, increment, and total number of wavelengths. For the optimized solutions chosen from 450-600 nm in 1 nm increments, the average errors of extracted THb, bc, and m s ' from the 15 tissue reflectance spectra increases from 11.7% to 12.2% to 18.1% as the total number of wavelengths used decreases from 12 to 8 to 5. When selecting in 5 nm and 10 nm increments, the errors increase from 11.9%-18.5% and 12.0%-19.0% as the wavelengths decrease from 12 to 5, respectively. This trend is not unexpected because as the total number of wavelengths as well as available center wavelength choices decrease, the possibility of capturing the spectral features of the absorbers and scatterers in tissue also decreases, thus increasing the extracted errors. Figure 4 puts into perspective the optimal number of illumination wavelengths required for the design of a breast spectral imaging system. At 5 wavelengths, the average extracted percent error of [THb], [bc], and ,m s '. from the 15 representative breast tissue reflectance spectra was close to 20%. The increase to 6, 7, and 8 wavelengths improved the extraction errors to 14%, 13%, and 12%, respectively. There are diminishing returns in improving extraction errors by adding more wavelengths past 8. The graph shows that for our particular application for breast tumor margin assessment, the appropriate number of wavelengths to use is 8.

Spectral Bandpass Affects Extraction Accuracy
Because the system used to obtain the existing breast tumor margin data had a spectral bandpass of 3.9 nm, it is challenging to evaluate the effect of changes in bandpass and to optimize both the wavelengths and bandpass of a system. The forward MC model was used to simulate the same 15 reflectance spectra used in the wavelength optimization. The original spectra were degraded to simulate increases in spectral bandpass of 5, 10, 20, 30, 40, and 50 nm. Representative reflectance spectra (10 mM [THb], 5.5 mM [bc], and 3.11 avg ,m s '.) with these changes in spectral bandpass are shown in Figure 5(a). The extracted errors from each case are shown in Figure 5(b). The results in Figure 5(b) show that to extract the breast tissue properties with good accuracy, the wavelengths must have ,10 nm FWHM, and 8 or more wavelengths have to be implemented in the system design.
Parameter extraction accuracies are affected by not only the number of wavelengths and the center wavelengths used, but also by the spectral bandpass of the wavelengths. It has been shown through existing clinical data that using 5 wavelengths is likely inadequate for accurate extractions of breast properties. The simulations on the effect of widening spectral bandpass also show that the errors with 5 wavelengths are nearly double those of 8. Similar to previously measured data, the increase from 8 to 12 wavelengths did not seem to have a significant impact on improving the extraction accuracy. Because commercially available LEDs have a much larger bandpass than the 10 nm identified here, more work is required to realize a compact, energy-saving spectral imaging system. Additional filters can be used to narrow the bandpass at each wavelength, or each of the LED spectra can be accounted for with the Monte Carlo reflectance model. Briefly, the shape of each LED spectrum can be added to the MC forward model, which computes a lookup table of ''LED-modified'' reflectance spectra for a wide range of optical properties. The inverse MC model can then be used to extract optical properties from samples measured with the system with the specified LEDs as sources.

Optical Contrast in Breast Tissue is Retained with Optimized Wavelength Choices
The top 3 optimized solutions with 5, 8, and 12 total wavelengths selected from 450-600 nm in 1, 5, and 10 nm increments were tested in an independent partial mastectomy tissue data set. Although the initial 15 reflectance spectra selected in the training set spanned the 10 th to 90 th percentiles of [THb], [bc], and ,m s '., in this large data set the histological diagnoses of the tissues corresponding to these spectra were not known. Therefore, a subset of measurements for which diagnosis was histopathologically confirmed was used to independently test the optimized wavelengths, and were split into 3 tissue types: adipose, fibroglandular, and malignant. Table 4 and Table 5 provide a summary of all errors extracted for the top 3 optimized solutions for each tissue type and for 5, 8, and 12 wavelengths in 1, 5, and 10 nm increments. A positive percent error value indicates an over-extraction by the reducedwavelength solutions; a negative value indicates an underestimation of the extracted parameters. When the number of wavelengths is increased from 8 to 12, the sum of absolute values of the extracted errors for the 3 parameters for any given set of solutions did not improve drastically, which was expected based on the findings shown in Figure 4. When the total number of wavelengths used is decreased from 8 to 5, however, the extracted errors are increased. For solutions selected in 1 nm increments, the extracted [THb] from normal adipose, normal fibroglandular, and malignant tissues using 8 wavelengths differed by 8.5%, 22.6%, and 4.1% from the full 61-wavelength set, respectively. When the wavelengths were reduced to 5, the errors increased to 22.2%, 23.3%, and 21.0% for the 3 tissue types. Similarly, with 8 wavelengths, the extracted [bc] errors for the adipose, fibroglandular, and malignant tissues were 23.0%, 27.0%, and 4.4%. With just 5 wavelengths, the errors increased to 220.8%, 10.9%, and 13.6%. The extracted reduced scattering coefficient errors were also more than doubled (6.6-8.5% to 16.8-22.8%) when total wavelengths decreased from 8 to 5. Because the motivation for this work is to provide a method of optimizing wavelength choices for a compact, cost-effective, and fast spectral imaging device that also has a simplistic design, the solutions with 12 total wavelengths were eliminated and only 8 total wavelengths are used for subsequent system design to be described in a future work.
Although the top optimized solution for each wavelength selection constraint had the lowest average errors of [THb], [bc], and ,m s '. extracted from the 15 representative reflectance spectra in the training set, the best choice of wavelengths from the independent breast data set validation is not necessarily the same as the best solution from the training set. This is possibly due to the relatively small size of the training set. However, the differences in errors between the top 3 optimized solutions are also small, which indicates that the solutions have been minimized. The best wavelength set for extracting optical parameters with the lowest errors in the histopathology confirmed data set is solution #1: 474, 481, 498, 509, 555, 573, 593, 596 nm for the wavelengths selected in 1 nm increments. The best wavelength set selected from 5 nm increments is also solution #1: 470, 480, 495, 500, 510, 550, 580, 600 nm. However, the best wavelength set selected from 10 nm increments is solution #2: 470, 480, 490, 500, 510, 560, 580, 600 nm. The wavelengths selected from 1 nm increments from 450-600 do have striking similarities with those selected from both the 5 nm and 10 nm increments. From a practical system design standpoint, this is a good finding because of the abundant availability of sources in 10 nm increments, such as those of LEDs.
On the other hand, the wavelengths selected from 1 nm increments (and some wavelengths in 5 nm increments) in the 450-600 nm range are not all commercially available to date. Since wavelength choices in both the 5 nm and 10 nm increments did not yield significantly different extracted errors, the 8 wavelengths selected in 10 nm increments (solution #2) were chosen for subsequent analyses in this study: 470, 480, 490, 500, 510, 560, 580, and 600 nm, which are all commercially available filters with 10 nm bandpasses. A practical low-cost implementation for these center wavelengths would be to use a white LED together with the respective bandpass filters. The optimization helped identify wavelength sets that can be used to extract tissue parameters with errors ,20%; however, the acceptable extraction errors for [THb], [bc], and ,m s '. has to be determined based on the contrast of these optical endpoints in various tissue types. In Table 6, the percent difference of quantifiable optical contrast was calculated between the histologically-confirmed median adipose and malignant tissue samples, as well as the median fibroglandular and the malignant samples. The percent change using the optimized wavelengths is smaller than the percent difference for all optical parameters so optical contrast should be preserved with these reduced wavelengths. On the flip side, the percent change using the evenly spaced wavelengths without any optimization is sometimes greater than the percent difference for the optical parameters, such as for [bc] and [bc]/,m s '.. This means that the optical contrast to differentiate benign from malignant samples may be washed out if the un-optimized wavelengths are used. By examining the percent differences between optical parameters of benign and malignant samples and the mean extracted errors from Table 4 and Table 5, it was also further established that none of the top wavelength sets with only 5 wavelengths can be used for our clinical application because most of the extracted errors are greater than percent differences of the optical parameters for the various breast tissue types tested. Figure 6 shows the Bland-Altman plots comparing the extractions between the optimum 8-wavelength set and the evenly    [7]. If 98% of the pixels that make up the margins have a ratio ,6, the margin is then classified as positive. The histogram shows that with the optimized 8 wavelengths, the contrast in breast margins is preserved. Without the optimization, some contrast is lost. These margin maps can potentially help surgeons identify suspicious ''hot spots,'' where cancer cells may be present at the surface of the excised specimen. Wilcoxon Rank Sum tests were performed to compare the Monte Carlo extracted optical properties using the full 61 and reduced wavelengths, both the optimized and evenly spaced 8. The boxplots of the comparisons are shown in Figure 8  different for all tissue types. The findings from an observational study on the effects of tissue heterogeneity reported by Kennedy et al were also duplicated [17].
[THb] and ,m s '. were both significantly increased in the malignant samples compared to the normal samples. Using the evenly spaced 8 wavelengths that were selected empirically for a previous system, the extracted ,m s '. is most notably underestimated for malignant samples while the [THb] and the ratio [bc]/,m s '. are overestimated. These are consistent with the Bland-Altman plots shown in Figure 6. Without wavelength optimization, the contrast between benign and malignant samples for [THb] and ,m s '. is not retained as wavelength numbers are reduced to the 8 evenly spaced wavelengths. Although these results show that a reduced wavelength set can be used in place of the full wavelength spectrum to obtain optical contrast in previously acquired breast tissue data, which have disproportionally large number of adipose normal tissue, the main goal of this study is not to show the predictive power for separating normal from tumor, but rather it is to find a reduced number of wavelengths that can be used to extract reasonably similar tissue parameters compared to the full spectrum. With the ability to extract similar tissue parameters from previous clinical studies reported by Wilke Figure 9 compares the extraction accuracy in the multi-absorber liquid phantom study using the full 450-600 nm range, the optimized wavelengths, and the evenly spaced wavelengths that were chosen empirically for a previously reported system [22]. The RMS errors for the extraction of [Hb], [Cr], and ,m s '. using the 61 wavelengths in the 450-600 nm range were 4.764.4%, 3.863.8%, and 3.762.4%, respectively. However, using the 8 evenly spaced wavelengths without any optimization, the RMS errors are 15 , and ,m s '. were decreased to 6.665.6%, 4.163.7%, and 4.963.0%, respectively. These errors are not significantly different from the errors from the full 450-600 nm spectrum. Referring back to Table 6 for an approximation of acceptable errors, these phantom results show the benefit of wavelength optimization for extracting hemoglobin and a bcarotene substitute while maintaining optical contrast, which is of utmost importance for our application. Conclusions A method that combines a genetic algorithm and inverse Monte Carlo reflectance model was applied and validated in an independent clinical dataset to systematically select wavelengths and bandwidths in the design of a spectral imaging system for the application of breast tumor margin assessment. The development of this method was motivated by the system design for a compact, cost-effective spectral imaging system, which features a white LED with bandpass filters. We demonstrate that at least 5 wavelengths are required to extract oxy-and deoxy-Hb, bc, and m s ' for this acquisition geometry. We found that the minimum number of wavelengths to retain optical contrast obtained from a full 450-600 nm set is 8 wavelengths. Designing a system with additional wavelengths up to 12 provides minimal improvements in extraction errors at a potentially higher cost of increasing system footprint, data acquisition time, and system design complexity. Additionally, a two-absorber turbid phantom study showed improved quantitative accuracy for optimized wavelength sets. This method may be adapted to the optimization of other quantitative spectroscopic imaging instruments in clinical applications beyond breast tumor margin assessment.