The purpose of this study was to investigate the relationship between visual score of emphysema and homology-based emphysema quantification (HEQ) and evaluate whether visual score was accurately predicted by machine learning and HEQ.
Materials and methods
A total of 115 anonymized computed tomography images from 39 patients were obtained from a public database. Emphysema quantification of these images was performed by measuring the percentage of low-attenuation lung area (LAA%). The following values related to HEQ were obtained: nb0 and nb1. LAA% and HEQ were calculated at various threshold levels ranging from −1000 HU to −700 HU. Spearman’s correlation coefficients between emphysema quantification and visual score were calculated at the various threshold levels. Visual score was predicted by machine learning and emphysema quantification (LAA% or HEQ). Random Forest was used as a machine learning algorithm, and accuracy of prediction was evaluated by leave-one-patient-out cross validation. The difference in the accuracy was assessed using McNemar’s test.
The correlation coefficients between emphysema quantification and visual score were as follows: LAA% (−950 HU), 0.567; LAA% (−910 HU), 0.654; LAA% (−875 HU), 0.704; nb0 (−950 HU), 0.552; nb0 (−910 HU), 0.629; nb0 (−875 HU), 0.473; nb1 (−950 HU), 0.149; nb1 (−910 HU), 0.519; and nb1 (−875 HU), 0.716. The accuracy of prediction was as follows: LAA%, 55.7% and HEQ, 66.1%. The difference in accuracy was statistically significant (p = 0.0290).
Citation: Nishio M, Nakane K, Kubo T, Yakami M, Emoto Y, Nishio M, et al. (2017) Automated prediction of emphysema visual score using homology-based quantification of low-attenuation lung region. PLoS ONE 12(5): e0178217. https://doi.org/10.1371/journal.pone.0178217
Editor: Bin Liu, Harbin Institute of Technology Shenzhen Graduate School, CHINA
Received: February 10, 2017; Accepted: May 9, 2017; Published: May 25, 2017
Copyright: © 2017 Nishio et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This study was supported by both JSPS KAKENHI Grant-in-Aid for Scientific Research (B) (Grant Number 26310209) and JSPS KAKENHI (Grant Number JP16K19883). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity and mortality worldwide . COPD causes considerable economic and social burden, which continue to increase. The Global Initiative for Chronic Obstructive Lung Disease guideline defines COPD as a preventable and treatable disease, which is characterized by persistent airflow limitation . The airflow limitation of COPD is usually progressive and associated with an enhanced chronic inflammatory response in the airways and the lung to noxious particles or gases. The airflow limitation is caused by a mixture of small airway disease and emphysema , which are often regarded as discrete phenotypes .
The percentage of low-attenuation lung area (LAA%) and visual scoring based on computed tomography (CT) images is frequently employed for evaluation of emphysema [3–13]. Although both these parameters are useful for evaluating the severity of emphysema, LAA% has been more frequently used for research purposes owing to the wide availability of software for calculating LAA% and the superior reproducibility of LAA%. However, visual score incorporates information that is not captured by LAA%, such as the spatial distribution of low-attenuation lung regions and findings other than emphysema [8, 9]. For example, visual score was shown to be associated with lung cancer risk in patients with emphysema, although the quantitative measures of emphysema (including LAA%) did not show such an association [10–12]. This implies that visual score may capture more clinically relevant information than LAA%.
In recent years, image processing using homology method is increasingly being used [13–18]. For example, Nishio et al used homology method for evaluating the spatial distribution of low-attenuation lung regions in patients with and without COPD , and they showed that homology-based emphysema quantification (HEQ) was useful for the assessment of emphysema severity. Because the previous study  showed that visual score was affected not only by LAA% but also by the spatial distribution of low-attenuation lung regions, it is conceivable that HEQ could be a more accurate predictor of visual score than LAA%.
The purpose of the current study was to investigate the relationship between visual score and emphysema quantification (LAA% and HEQ) and evaluate whether visual score was accurately predicted by supervised machine learning and emphysema quantification. Previously, a LAA% threshold was optimized by assessing the relationship between LAA% and severity of COPD. To our knowledge, there was no study to investigate the effect of the LAA% threshold on the relationship between LAA% and visual score. For this purpose, LAA% and HEQ were calculated at various threshold levels in the present study. In addition, the combination of emphysema quantification at various threshold levels was used for predicting visual score with supervised machine learning. This method was inspired by persistent homology. Persistent homology is a method for computing topological features at different spatial resolution [19, 20]. Unlike persistent homology, feature vector of the current study was simply constructed using the concatenation of Betti numbers obtained from binarized CT images at the various threshold levels. The method of the current study is similar to those used in bioinformatics, such as Pse-in-One, Pse-Analysis, repDNA, and iDHS-EL [21–24]. These studies and the current study focused on how to create the feature vector which can be easily and effectively combined with machine learning algorithm.
Materials and methods
The current study used anonymized data from a public database. Therefore, approval of institutional review board or informed consent obtained from patients was not necessary in our country.
Database of CT images
The details of the CT database are available elsewhere [25, 26]. CT images of 39 subjects (9 never smokers, 10 smokers without COPD, and 20 smokers with COPD) were obtained from the database. The CT examinations were performed using four-detector rows CT scanner (LightSpeed QX/i; General Electric Medical Systems, Milwaukee, WI, USA). The following parameters were used: in-plane resolution, 0.78 × 0.78 mm; slice thickness, 1.25 mm; tube voltage, 140 kV; and tube current-time product, 200 mAs. The CT images were reconstructed using a high-spatial-resolution algorithm. The database provided 115 high-resolution CT slices. The severity of emphysema for each of the 115 slices was assessed as visual score by an experienced chest radiologist and a CT experienced pulmonologist. The score criteria were as follows: 0, no emphysema; 1, minimal; 2, mild; 3, moderate; 4, severe; and 5, very severe emphysema. A consensus was reached in case of any disagreement. Representative CT images of the database are shown in Fig 1. Summary of visual score in the 115 CT slices is shown in Fig 2.
(A) visual score = 0 (no emphysema); (B) visual score = 3 (moderate); (C) visual score = 5 (very severe). The CT images were displayed with a lung window setting of 1600 HU window width and −550 HU window level.
The methodology for calculation of LAA% and HEQ is described in the previously published papers [4, 13]. First, the lungs were automatically segmented from the CT images based on region-growing method and a threshold of −500 HU. After lung segmentation, LAA% was calculated as follows: where low-attenuation lung pixels were defined as lung pixels with CT values lower than the predefined threshold . When calculating LAA%, the CT images were binarized using the predefined threshold and results of lung segmentation. In the binarized CT images, 1 indicated a normal lung pixel and 0 indicated a non-lung pixel or low-attenuation lung pixel. Representative images of the binarized CT images are shown in Fig 3. The binarized images were used for HEQ.
(A) CT image; (B)–(E) binarized images at threshold levels of −975, −950, −925, and −900 HU. Note: Fig 3(A) is identical to Fig 1(C). The CT images were displayed with a lung window setting of 1600 HU window width and −550 HU window level.
Next, HEQ was performed. Betti numbers are important indices in homology and were used as HEQ in a previous study . Betti numbers comprise b0 and b1 in case of two-dimensional images. In the current study, b0 corresponds to the number of low-attenuation lung regions, and b1 corresponds to the number of normal lung regions surrounded by the low-attenuation lung regions. Intuitively, b0 and b1 are related to “holes” formed because of emphysema. Betti numbers could be calculated from the binarized CT images prepared when calculating LAA%. The detailed process of calculating b0 and b1 has been described elsewhere . The examples of calculating b0 and b1 are available in S1 Fig (Supporting information). Because b0 and b1 were affected by size of lung area, b0 and b1 were normalized by the total number of lung pixels . These normalized values were referred to as nb0 and nb1, and were used as the results of HEQ.
LAA% and HEQ were calculated in each of the 115 slices at various threshold levels ranging from −1000 HU to −700 HU. The threshold level was increased in increments of 5 HU. Therefore, LAA% and HEQ was calculated at 60 different threshold levels. Fig 4 shows representative results of HEQ at the 60 different threshold levels, which were obtained from the CT images shown in Fig 1.
Note: Results of Fig 4(A)–4(C) were obtained from CT images of Fig 1(A)–1(C), respectively. Abbreviation: HEQ, homology-based emphysema quantification; nb0, the zero-dimensional Betti number normalized by the total number of lung pixel; nb1, the one-dimensional Betti number normalized by the total number of lung pixel.
Prediction of visual score using machine learning
Visual score was predicted using supervised machine learning and the results of emphysema quantification (LAA% or HEQ). Random Forest algorithm was adopted for supervised machine learning . As hyperparameters of Random Forest, the following values were used: number of trees in the forest, 10, 100, or 1000; and number of features to consider when searching best split, (length of feature vector) × 0.1, 0.3, 0.5, 0.7, or 0.9. The values of LAA% at the threshold levels ranging from −1000 HU to −700 HU were used as the feature vector of Random Forest, and the classifier for predicting visual score was built. In this classifier (CLAA%), the length of feature vector was 60. The other type of classifier was built using Random Forest and the values of nb0 and nb1 at the threshold levels ranging from −1000 HU to −700 HU. In the classifier (CHEQ), the length of feature vector was 120. For example, for CT images shown in Fig 1(A)–1(C), the feature vector of CHEQ was constructed based on the concatenation of the 1st and 2nd column of Fig 4.
Furthermore, we evaluated the effect of the threshold level on classifiers’ prediction. The lower limit of the threshold was changed from −1000 HU to the following values: −950, −900, −850, −800, and −750 HU. Similarly, the upper limit of the threshold was changed from −700 HU to the following values: −950, −900, −850, −800, and −750 HU. Each combination of the upper and lower limits of the thresholds was evaluated for both CLAA% and CHEQ. The length of feature vector was changed based on the lower and upper limits of the threshold. For example, when −1000 and −1000 andere used as the lower and upper limits of the threshold, the length of feature vector of CLAA% was 30.
First, the relationship between emphysema quantification and visual score was evaluated by calculating the Spearman’s correlation coefficient at the various threshold levels. Next, for both CLAA% and CHEQ, results of prediction were obtained using leave-one-patient-out cross validation. The best hyperparameters of Random Forest were selected based on the results of the cross validation. To evaluate the performance of CLAA% and CHEQ, contingency tables were prepared for the prediction of classifiers and actual visual score based on the results of the cross validation. Then, accuracy of prediction was calculated using the following equation: where TP, TN, FP, and FN are true positives, true negatives, false positives, and false negatives, respectively. Using the contingency tables of the current study, accuracy was obtained by dividing sum of main diagonal by sum of all elements. The difference in the accuracy between CLAA% and CHEQ was investigated using the McNemar’s test. In addition to accuracy, weighted Kappa was calculated between prediction of classifiers and actual visual score. All statistical analyses were performed using R-3.2.2 (available at http://www.r-project.org/). To perform the exact McNemar’s test and calculate the weighted Kappa, exact2x2 package (version-1.4.1) and irr package (version-0.84), respectively, were used. For calculating the weighted Kappa, kappa2 function of irr package was used. “squared” was passed to the kappa2 function as its weight argument.
Feature selection and others
Because the feature vector obtained in the current study might be redundant, feature selection was performed. The selection was performed based on the importance of the feature calculated by Random Forest. Originally, this method was used in support vector machines, wherein weights of classifier calculated by support vector machines were used as the criteria for the feature selection [28, 29]. The feature selection was performed on the training partitions of leave-one-patient-out cross validation. For each type of the feature vector, the length was reduced by 10%, 30%, and 50% of the original, by using the feature selection. Other types of feature selection and classifier were also evaluated (For the detail, see Supporting information).
The Spearman’s correlation coefficients for emphysema quantification and visual score at the 60 threshold levels are listed in S1 Table (Supporting information). Table 1 summarizes the results of Spearman’s correlation coefficients. The correlation coefficients were as follows: LAA% at −950 HU, 0.567; LAA% at −910 HU, 0.654; LAA% at −875 HU, 0.704; nb0 at −950 HU, 0.552; nb0 at −910 HU, 0.629; nb0 at −875 HU, 0.473; nb1 at −950 HU, 0.149; nb1 at −910 HU, 0.519; and nb1 at −875 HU, 0.716. For both LAA% and nb1, the best correlation was obtained at the threshold = −875 HU.
Tables 2 and 3 show the accuracy of CLAA% and CHEQ at each combination of the threshold levels, respectively. The best accuracy was as follows: CLAA%, 55.7% and CHEQ, 66.1%. The best accuracy of CLAA% was obtained when using LAA% at the threshold levels ranging from −1000 HU to −850 HU or from −950 HU to −850 HU. The best accuracy of CHEQ was obtained using nb0 and nb1 at the threshold levels ranging from −1000 HU to −700 HU. The difference between the best accuracy of CLAA% and CHEQ was statistically significant (p = 0.0290). Tables 4 and 5 show the contingency tables for the most accurate CLAA% and CHEQ, respectively. Using the contingency tables provided as Tables 4 and 5, the weighted Kappa was as follows: LAA%, 0.688 and HEQ, 0.697.
S2 Table (Supporting information) shows the results of feature selection for CLAA% and CHEQ. In both CLAA% and CHEQ, there were minimal differences between best accuracy with and without feature selection. This implies either that there was little redundancy in LAA% or HEQ at different thresholds, or that Random Forest could build robust classifiers using LAA% or HEQ even if LAA% or HEQ at the different threshold levels provided redundant information. S3 Table and S1 Doc show the results of other types of feature selection and classifier.
The current study evaluated the relationship between emphysema quantification and visual score. Both LAA% and HEQ showed the strong correlation with visual score; the best correlation coefficients of LAA% and nb1 were 0.704 and 0.716, respectively. For the correlation between visual score and emphysema quantification, the optimal threshold level for both LAA% and HEQ was −875 HU. When using emphysema quantification and supervised machine learning to predict visual score, HEQ was more useful for predicting visual score than LAA%. The accuracy of CHEQ was statistically better than that of CLAA% (p = 0.0290).
The best correlation between LAA% and visual score in our study was observed at the threshold of −875 HU, which was higher than the optimal threshold reported in previous studies. For example, a single LAA% threshold of −950 HU was earlier reported to be an acceptable threshold for emphysema quantification . In previous studies, the LAA% threshold was optimized by assessing the relationship between LAA% and severity of COPD using modalities such as the pulmonary function test. However, we optimized the threshold of LAA% by assessing its relationship with visual score. As a result, the optimal threshold determined in the present study is different from that reported earlier. A previous study  suggested that visual score of emphysema was not only determined by LAA% but also by other factors such as lesion size, predominant type, distribution of emphysema, and small-airway disease. These factors may affect the optimal threshold of LAA% determined on the basis of its correlation with visual score.
One clinical application of the current study is to change the threshold of LAA% when lung cancer risk is predicted using CT images. Previous studies have investigated the relationship between emphysema severity (e.g. LAA%) and lung cancer risk using the conventional threshold level (e.g., −950 or −910 HU) [10–12]. These studies showed the significant correlation of visual score of emphysema, but not of LAA%, with the risk of lung cancer. In the present study, the correlation between emphysema quantification and visual score was stronger at the relatively higher threshold level (−875 HU) than the conventional threshold level; therefore, it is speculated that at the relatively high threshold level, LAA% may be significantly associated with the risk of lung cancer. This speculation should be investigated in a larger cohort in future.
Another application of the current study is to utilize the results of CHEQ to predict the risk of lung cancer. Although visual score was significantly associated with the risk of lung cancer, visual score of emphysema can be a severe burden for radiologists or pulmonologists if a lung cancer screening program utilizes CT as a tool for risk stratification. Use of the results of CHEQ in place of visual score may reduce the burden on radiologists or pulmonologists. Because the weighted Kappa between CHEQ and visual score was better than 0.6, CHEQ may potentially be used as a substitute to visual score.
According to Tables 2–5 and the results of the McNemar’s test, the predictive accuracy of CHEQ was statistically better than that of CLAA%. In a previous study, HEQ was found useful for evaluating the spatial distribution of low-attenuation lung region . We speculate that because HEQ provides a measure of the spatial distribution of low-attenuation lung region, it may be superior to LAA% for predicting visual score. In our study, use of a wider threshold range improved the predictive accuracy of HEQ (Table 3). This implies that visual score was affected by the spatial distribution of low-attenuation lung region at the relatively high threshold level. This speculation is, at least partially, consistent with the results of a previous study .
We used the changes in Betti numbers of the binarized CT images to construct the feature vector for machine learning. Adcock et al used intensity filtration and matching metric to utilize support vector machine for classification of liver tumor on CT images . Although their intensity filtration was partly similar to our method, their construction of feature vector was based on the metric of barcode. Qaiser et al showed that automated tumor segmentation on histology images could be performed rapidly using topological changes in Betti numbers . Although their method (persistent homology profiles) was compatible with ours, their task was different from ours.
There are several limitations to this study. First, the number of patients was relatively small. In particular, the number of patients with severe or very severe emphysema cases was very small. According to Tables 4 and 5, the predictive accuracy in severe or very severe emphysema cases was worse than that in the other cases. This deterioration in the predictive accuracy may be attributable to the limited number of cases with severe or very severe emphysema. To improve the predictive accuracy and validate the results of the current study, a larger cohort of patients is required for future research. Second, two-dimensional image analyses were performed. Recently, quantification based on thin-slice volumetric CT images has been frequently used. In future, we will extend our method for three-dimensional image analyses. Third, although lung cancer risk was discussed in the current paper, we did not investigate the association between HEQ and the risk. Fourth, although support vector machine with metric or kernel trick specialized in persistent homology was suggested [18, 32], we did not evaluate these methods in the present study. Fifth, the clinical application of HEQ was not investigated in the present study. Because a previous study examined the relationship between HEQ and COPD severity , we focused on the relationship between HEQ and visual score of emphysema in the present study.
In conclusion, LAA% and HEQ at −875 HU showed a stronger correlation with visual score as compared to that at the conventional threshold level (−950 or −910 HU). By providing a measure of the spatial distribution of low-attenuation lung region, HEQ was more useful for predicting visual score as compared to LAA%.
S1 Table. Spearman’s correlation coefficients for emphysema quantification and visual score at the 60 threshold levels.
S2 Table. Results of feature selection in CLAA% and CHEQ.
S3 Table. Results of other types of feature selection.
S1 Doc. Results of other types of classifier.
This study was supported by both JSPS KAKENHI Grant-in-Aid for Scientific Research (B) (Grant Number 26310209) and JSPS KAKENHI (Grant Number JP16K19883). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Conceptualization: Mizuho Nishio.
- Data curation: Mizuho Nishio.
- Formal analysis: Mizuho Nishio KN.
- Funding acquisition: KN.
- Investigation: Mizuho Nishio.
- Methodology: Mizuho Nishio.
- Project administration: MY KT.
- Resources: Mizuho Nishio.
- Software: Mizuho Nishio KN.
- Supervision: KT.
- Validation: TK YE.
- Visualization: Mizuho Nishio.
- Writing – original draft: Mizuho Nishio KN.
- Writing – review & editing: Mizuho Nishio KN TK Mari Nishio.
- 1. Mathers CD, Loncar D. Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 2006;3:e442. pmid:17132052
- 2. Vestbo J, Hurd SS, Agustí AG, Jones PW, Vogelmeier C, Anzueto A, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Respir Crit Care Med. 2013;187:347–365. pmid:22878278
- 3. Galbán CJ, Han MK, Boes JL, Chughtai KA, Meyer CR, Johnson TD, et al. Computed tomography-based biomarker provides unique signature for diagnosis of COPD phenotypes and disease progression. Nat Med. 2012;18:1711–1715. pmid:23042237
- 4. Müller NL, Staples CA, Miller RR, Abboud RT. “Density mask”. An objective method to quantitate emphysema using computed tomography. Chest. 1988;94:782–787. pmid:3168574
- 5. Bankier AA, De Maertelaer V, Keyzer C, Gevenois PA. Pulmonary emphysema: subjective visual grading versus objective quantification with macroscopic morphometry and thin-section CT densitometry. Radiology. 1999; 211: 851–858. pmid:10352615
- 6. Mishima M, Hirai T, Itoh H, Nakano Y, Sakai H, Muro S, et al. Complexity of terminal airspace geometry assessed by lung computed tomography in normal subjects and patients with chronic obstructive pulmonary disease. Proc Natl Acad Sci U S A. 1999;96:8829–8834. pmid:10430855
- 7. Nakano Y, Muro S, Sakai H, Hirai T, Chin K, Tsukino M, et al. Computed tomographic measurements of airway dimensions and emphysema in smokers: Correlation with lung function. Am J Respir Crit Care Med. 2000;162:1102–1108. pmid:10988137
- 8. COPDGene CT Workshop Group, Barr RG, Berkowitz EA, Bigazzi F, Bode F, Bon J, et al. A combined pulmonary-radiology workshop for visual evaluation of COPD: study design, chest CT findings and concordance with quantitative evaluation. COPD. 2012;9: 151–159. pmid:22429093
- 9. Gietema HA, Müller NL, Fauerbach PV, Sharma S, Edwards LD, Camp PG, et al. Quantifying the extent of emphysema: factors associated with radiologists' estimations and quantitative indices of emphysema severity using the ECLIPSE cohort. Acad Radiol. 2011 Jun;18(6):661–71. pmid:21393027
- 10. Wille MM, Thomsen LH, Petersen J, de Bruijne M, Dirksen A, Pedersen JH, et al. Visual assessment of early emphysema and interstitial abnormalities on CT is useful in lung cancer risk analysis. Eur Radiol. 2016 Feb;26(2):487–94. pmid:25956938
- 11. Schwartz AG, Lusk CM, Wenzlaff AS, Watza D, Pandolfi S, Mantha L, et al. Risk of lung cancer associated with COPD phenotype based on quantitative image analysis. Cancer Epidemiol Biomarkers Prev. 2016 Jul 6. pii: cebp.0176.2016.
- 12. Smith BM, Pinto L, Ezer N, Sverzellati N, Muro S, Schwartzman K. Emphysema detected on computed tomography and risk of lung cancer: a systematic review and meta-analysis. Lung Cancer. 2012 Jul;77(1):58–63. pmid:22437042
- 13. Nishio M, Nakane K, Tanaka Y. Application of the homology method for quantification of low-attenuation lung region in patients with and without COPD. Int J Chron Obstruct Pulmon Dis. 2016;11:2125–2137. pmid:27660430
- 14. Ishida M, Kida K, Mizobe K, Nakane K. The Betti number of fatigue fracture surfaces of low carbon steel (JIS, S45C). Adv Mat Res. 2015;1102:59–63.
- 15. Nakane K, Santos EC, Honda T, Mizobe K, Kida K. Homology Analysis of Structure of high carbon bearing steel: Effect of Repeated quenching on Prior Austenite Grain Size. Mater Res Innov. 2014;18:33–37.
- 16. Nakane K, Takiyama A, Mori S, Matsuura N.Homology-based method for detecting regions of interest in colonic digital images. Diagn Pathol. 2015;10:36. pmid:25907563
- 17. Nakane K, Tsuchihashi Y, Matsuura N. A simple mathematical model utilizing topological invariants for automatic detection of tumor areas in digital tissue images. Diagn Pathol. 2013;8(Suppl 1):S27.
- 18. Adcock A, Rubin D, Carlsson G. Classification of Hepatic Lesions using the Matching Metric. Comput Vis Image Underst. 2014;121:36–42.
- 19. Edelsbrunner H, Harer J. Persistent homology-a survey. Contemp Math. 2008;453:257–282.
- 20. Carlsson G. Topology and data. Bull Am Math Soc. 2009;46(2):255–308.
- 21. Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research. 2015;43(Web Server issue):W65–W71.
- 22. Liu B, Wu H, Zhang D, Wang X, Chou KC. Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods. Oncotarget. 2017 Feb 21;8(8):13338–13343. pmid:28076851
- 23. Liu B, Liu F, Fang L, Wang X, Chou KC. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics. 2015 Apr 15;31(8):1307–9. pmid:25504848
- 24. Liu B, Long R, Chou KC. iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics. 2016 Aug 15;32(16):2411–8. pmid:27153623
- 25. Computed Tomography Emphysema Database. http://image.diku.dk/emphysema_database/. Accessed July 17, 2016.
- 26. Sørensen L, Shaker SB, de Bruijne M. Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans Med Imaging. 2010; 29(2):559–69. pmid:20129855
- 27. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32.
- 28. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.
- 29. Nishio M, Nagashima C. Computer-aided diagnosis for lung cancer: usefulness of nodule heterogeneity. Academic Radiology in press.
- 30. Wang Z, Gu S, Leader JK, Kundu S, Tedrow JR, Sciurba FC, et al. Optimal threshold in CT quantification of emphysema. Eur Radiol. 2013;23:975–84. pmid:23111815
- 31. Qaiser T, Sirinukunwattana K, Nakane K, Tsang YW, Epstein D, Rajpoot N. Persistent Homology for Fast Tumor Segmentation in Whole Slide Histology Images. Procedia Comput Sci. 2016;90:119–124.
- 32. Kusano G, Fukumizu K, Hiraoka Y. Persistence weighted Gaussian kernel for topological data analysis. Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.