Computer-Aided Tomographic Analysis of Interstitial Lung Disease (ILD) in Patients with Systemic Sclerosis (SSc). Correlation with Pulmonary Physiologic Tests and Patient-Centred Measures of Perceived Dyspnea and Functional Disability

Objectives This study was designed (a) to evaluate an improved quantitative lung fibrosis score based on a computer-aided diagnosis (CaM) system in patients with systemic sclerosis (SSc),—related interstitial lung disease (SSc-ILD), (b) to investigate the relationship between physiologic parameters (forced vital capacity [FVC] and single-breath diffusing capacity for carbon monoxide [DLCO]), patient-centred measures of dyspnea and functional disability and CaM and visual reader-based (CoVR) methods, and (c) to identify potential surrogate measures from quantitative and visual HRCT measurement. Methods 126 patients with SSc underwent chest radiography, HRCT and PFTs. The following patient-centred measures were obtained: modified Borg Dyspnea Index (Borg score), VAS for breathing, and Health Assessment Questionnaire-Disability Index (HAQ-DI). HRCT abnormalities were scored according to the conventional visual reader-based score (CoVR) and by a CaM. The relationships among the HRCT scores, physiologic parameters (FVC and DLCO, % predicted) results and patient-centred measures, were calculated using linear regression analysis and Pearson’s correlation. Multivariate regression models were performed to identify the predictor variables on severity of pulmonary fibrosis. Results Subjects with limited cutaneous SSc had lower HAQ-DI scores than subjects with diffuse cutaneous SSc (p <0.001). CaM and CoVR scores were similar in the 2 groups. In univariate analysis, a strong correlation between CaM and CoVR was observed (p <0.0001). In multivariate analysis the CaM and CoVR scores were predicted by DLco, FVC, Borg score and HAQ-DI. Age, sex, disease duration, anti-topoisomerase antibodies and mRSS were not significantly associated with severity of pulmonary fibrosis on CaM- and CoVR methods. Conclusions Although a close correlation between CaM score results and CoVR total score was found, CaM analysis showed a more significant correlation with DLco (more so than the FVC), patient-centred measures of perceived dyspnea and functional disability. Computer-aided tomographic analysis is computationally efficient, and in combination with physiologic and patient-centred measures, it could allow a means for accurately assessing and monitoring the disease progression or response to therapy.


Introduction
Systemic sclerosis (SSc) is a heterogeneous complex of diseases characterized by multiorgan involvement, endothelial dysfunction, excessive collagen production and immune system abnormalities [1]. Clinically, patients can have diverse systemic manifestations with any combination of skin, pulmonary, cardiac, renal, musculoskeletal and gastrointestinal involvement. Interstitial lung disease (ILD) is a devastating and significant cause of death in patients with SSc. In early autopsy studies, up to 100% of patients were found to have parenchymal involvement [1]. Parenchymal lung involvement often appears early after the diagnosis of SSc, with 25% of patients developing clinically significant lung disease within 3 years as defined by physiological, radiographic or bronchoalveolar lavage abnormalities [2]. Progressive decrements in lung function of patients with symptomatic lung disease are also accompanied by a decline in their emotional well-being and in their ability to perform day-to-day activities, that is, their health-related quality of life (HRQoL) [3]. Baseline diffusing capacity for carbon monoxide (DLco) and forced vital capacity (FVC) levels have traditionally been used as measures of disease severity and reductions in both parameters have been associated with increased mortality in the SSc-ILD.
High resolution computed tomography (HRCT) has now become an important part of the routine detection and evaluation of SSc-ILD [4,5]. It has been shown to be more accurate than chest radiography in detecting and characterizing diffuse lung diseases, and abnormalities on HRCT correlate more closely with physiologic parameters (FVC and DLco) [6,7]. HRCT features of fibrosis are present in 55% to 65% of all patients with SSc and in up to 96% of those with abnormal pulmonary function tests (PFTs) results [8,9]. Although formal CT scoring is not realistic in routine practice, rapid semi-quantitative estimation of disease extent on CT in combination with a FVC threshold has been used to stage disease as limited or extensive [10][11][12][13]. To date, several computer tools to automatically segment the lung, using HRCT images, have been developed. They include image display (e.g., multiplanar reformations and surface shading for three-dimensional and volume rendering), anatomic image quantitation (e.g., area and volume of airways and lungs) and regional characterization of lung tissue (analysing attenuation, changes in attenuation, and texture patterns in the imaged lung) [13,14]. With respect to the traditional visual interpretation of HRCT lung findings, the automatic computer-based assessment may improve the objectivity, sensitivity, and repeatability of quantitative changes in the lung features [14][15][16]. Previously we showed a high agreement concerning the semiquantitative HRCT analysis performed by experienced radiologists, and a significant association between the descriptive parameters by both the quantitative OsiriX assessment and the HRCT semi-quantitative analysis [17]. More recently, we investigated the performance of a computer-aided method (CaM) for the quantification of ILD, in seventy-nine patients with SSc, in terms of correlation regarding both the conventional visual reader-based score (CoVR) and the physiologic parameters, feasibility and inter-reader reliability of the CaM [18]. The results indicate that the CaM analysed by OsiriX provides a good concurrent validity, reliability and feasibility for the assessment of SSc-ILD.
The three purposes of our study were: (a) to evaluate an improved quantitative lung fibrosis score based on CaM system in patients with SSc-ILD, (b) to investigate the relationship between physiologic parameters (FVC and DLco), patient-centred measures of dyspnea and functional disability and CaM and CoVR methods, and (c) to identify potential surrogate measures from quantitative and visual HRCT measurement.

Study population
This cross-sectional study was approved by our institutional review board of the Zona Territoriale 5 ASUR Marche. The participants provided a verbal consent to participate in this study. A written consent was not obtained because HRCT, PFTs and the other tests applied to participants are the current clinical practice used in our Departement for the evaluation of patients with SSc. We recorded the consent on the clinical record of the patients and the ethics committee approved this procedure. Patients with SSc, defined by the American College of Rheumatology (formerly, the American Rheumatism Association) classification criteria [19], were included in the study. SSc patients were classified in limited and diffuse cutaneous involvement (lcSSc and dcSSc, respectively). LcSSc was characterized by thickening of the skin distal to the elbows and knees and proximal to the clavicles (including the face) whereas dcSSc was characterized by thickening of the skin proximal as well as distal to the elbows and knees and including the trunk and the face subtype [20]. The modified Rodnan skin score (mRSS) was used for the assessment of skin damage in patients with SSc [21]. Scores of 0 (no thickening), 1 (mild thickening), 2 (moderate thickening), and 3 (severe thickening) are given to each area and added up to a total score from 0 (best) to 51 (worst). The presence of autoantibodies, including anti-topoisomerase I and anti-centromere was also investigated. Exclusion criteria included: absence of recent or current respiratory infection, severe pulmonary hypertension requiring specific treatment with either bosentan or epoprostenol, uncontrolled congestive heart failure or clinically significant abnormalities other than ILD identified on chest radiography or on HRCT. Echocardiogram and right heart catheterization are not included as a routine part of the visit. Only a small group of subjects (24.6%) had undergone echocardiography at enrollment, so the results of these studies were not included in our analysis.
perceived dyspnea (breathing discomfort) with a scale of 0 = no breathlessness at all, 0.5 = very very slight (just noticeable), 1 = very slight, 2 = slight breathlessness, 3 = moderate, 4 = somewhat severe, 5 = severe breathlessness, 7 = very severe breathlessness, 9 = very, very severe (almost maximum) and 10 = maximum [22]. The VAS for breathing allows patients to selfassess their degree of difficulty in performing daily activities due to shortness of breath. A continuous 100-mm scale (from no limitation of activity to severe limitation of activity) is used for this assessment [25]. The HAQ-DI is a condition-specific measure of functional status (assessing activities of daily living), intended for use in arthritis [23]. The standard HAQ-DI is calculated as an ordinal variable, from 0 = no disability to 3 = severe disability. It has been found to correlate with cutaneous and visceral involvement in SSc at baseline and with changes in physiologic parameters over time [26,27].

Pulmonary function tests
PFTs were performed within 2 weeks since the execution of the HRCT scan by a flow-sensing spirometer and a body plethysmograph connected to a computer for data analysis. PFTs were performed while the patient was at rest in a seated position. These tests consisted in spirometry using a computerised lung analyser (MasterScreen Diffusion, Jaeger GmbH, Höchber, Germany).
FVC (% predicted value) and DLCO (% predicted value, corrected for haemoglobin) were obtained. The PFTs performed were based on published guidelines [28]. At least three measurements were taken for each variable to guarantee repeatability.

HRCT assessment and visual reader-based disease quantification
All patients underwent volumetric thin-section CT examinations using a CT 64 GE light Speed VCT power scanner. Scans were obtained at full inspiration from the apex to the lung base with the patients in the supine position. Scanning parameters were: 120 kV, and 300 mAs, acquisition time 0.8 s, slice thickness 1 mm with 0.6 mm reconstructions and the smallest possible field of view (FOV) covering both lungs. The scans were viewed with a window level of -600 Hounsfield units (HU) and width of 1600 HU. HRCT assessment did not include the use of contrast media agents. The parenchymal abnormalities on HRCT were coded and scored by two independent readers, blinded to the results, according to Warrick et al [11]. A point value was assigned to each abnormality as follows: ground-glass appearance = 1, irregular pleural margins = 2, septal/subpleural lines = 3, honeycombing = 4, subpleural cysts = 5. In each patient the "severity of disease" score was obtained by adding single point values. The mean values of the two independent readers were used as a final control group. An "extent of disease" score was obtained by counting the number of bronchopulmonary segments involved for each abnormality: one to three segments scored as 1; four to nine segments scored as 2; more than nine segments scored as 3. The severity and extent of disease were then calculated as total HRCT score (range from 0 to 30). Each radiologist reviewed the scans independently of the other and a consensus opinion between them was taken in the event of disagreement. At the time of CT interpretation, the two radiologists (L.C. and M.C.) were not aware of the patient's history and physiologic results. The intraclass correlation coefficients for level of agreement between the radiologists on the total HRCT scores was 0.80 [18].

Computer-aided scoring quantification process
HRCT images were reconstructed and analysed by OsiriX MD 7, a DICOM viewer software (OsiriX MD version 7, 64-bit format) on a Mac Mini (2.8 GHz Intel Core 2 Duo Desktop Computer, 16 GB random-access memory; Apple Computer, Cupertino, CA, USA) running Mac Operating System OSX 10.12.2. The DICOM data were stored in the OsiriX 7 using the ''Copy linked files to Database folder" under ''file" in the OsiriX 7 dropdown menu. For each section, a semiautomatic lung parenchymal segmentation was performed in order to obtain analysis of all images; then, descriptive parameters of the computer analysis, was calculated. This program uses a semiautomated thresholding technique to isolate the lungs from other tissues and structures and selects all pixels between -200 and -1.024 HU [14,16]. Minimal user intervention was required to exclude blood vessels and large bronchi near the hilum. CT attenuation of normal lung parenchyma is reported to range from -800 to -900 HU, depending on ispiration or expiration, on the level of inspiration achieved for the scan and on anatomical location, that is ventral or dorsal portion [29][30][31]. The area with attenuation between -500 and -700 HU was defined as the value of radiodensity for ILD, including both ground-glass opacity and reticular opacity [16]. Moreover, the radiodensity of -500 HU was selected as the thresholds between consolidation and ground glass opacity [14]. Therefore, in agreement with Shin et al. [16] −700 HU is selected as the predefined threshold value for regions of normal lung. Fig 1 shows the representative sequences of the OsiriX segmentation process of two study participants with mild (A) and severe (B) pulmonary fibrosis, respectively. As mentioned before [18], there was total concordance between the first and second measurements on CaM scores (95% limits of agreement = 0 to 0, ICC = 1).

Statistical analysis
All data was entered into a Microsoft Access database developed for the management of all data. The data was analysed using the MedCalc 1 version 16.0 (MedCalc Software, Mariakerke, Belgium). Values in this study were expressed both as mean±SD (standard deviation) and median (interquartile range, IQR). A two-sample "t" test was used to compare continuous variables and the χ2 test was used to compare categorical variables between patients. The relationships among the lung segmentation analysis, the readers and the PFTs results were calculated using univariate regression analysis and Pearson's product moment correlation (Pearson r values). Furthermore, multivariate regression analyses were performed to identify fctors associated with higher percentage of pulmonary fibrosis on CaM and with CoVR scores. Covariates entered into the models included age, sex, disease duration, anti-topoisomerase I antibodies, mRSS, Borg score, FVC and DLco (predicted). The VAS for breathing was excluded due to collinearity with Borg score. The results were expressed as multivariate regression coefficient (R) and square regression coefficient corrected (R 2 ) for the number of variables entered in the analysis. This enables the calculation and the predictivity of each multivariate model according to the number of variables entered in the model itself. Significance was set at p <0.05.

Results
The baseline characteristics of the 126 SSc patients are summarized in Table 1 Table 2). A close correlation between CaM score results and CoVR total score was observed (Pearson r = 0.718; p <0.0001) (Fig 2A). The CaM scores showed a highly significant negative correlation with FVC (Pearson r -0.556; p <0.0001) (Fig 2B) and the DLco (Pearson r -670; p <0.0001) (Fig 2C). The Borg score and VAS for breathing were highly correlated with each other (Pearson r 0.627). Borg score and VAS for breathing were also significantly correlated (p <0.0001) with HAQ-DI (Pearson r 0.546 and 0.627, respectively). The HAQ-DI was further significantly correlated with CaM score (Pearson r 0.597; p <0.0001) ( Fig 2D) and CoVR total score results (Pearson r 0.388; p <0.0001) and with mRSS (Pearson r 0.468; p <0.0001). The FVC and DLco were only weakly correlated with the HAQ-DI, and showed no correlations with the mRSS. The results of the multivariate regression analysis indicate that the combination of DLco, FVC, Borg score and HAQ-DI explained the 83.4% of variance of percentage of pulmonary fibrosis on CaM-based method, whereas the variance of the CoVR total score was independently predicted by the same variables, with a lower coefficient of determination R 2 (77.9% of variance) ( Table 3). Age, sex, disease duration, anti-topoisomerase I antibodies and mRSS were not significantly associated with percentage of pulmonary fibrosis on CaM and with CoVR scores.

Discussion
The pulmonary involvement is a serious complication of SSc [32,33]. CT features of fibrosis are present in 55% to 65% of all patients with SSc and in up to 96% of those with abnormal PFT results [34][35][36]. As a result, HRCT has become an important part of the routine evaluation of ILD, and, in conjunction with PFT, plays a critical role in the treatment of ILD and in the prediction of outcomes [12,[37][38][39][40]. Semi-quantitative scoring methods, by grading each abnormality along with the bronchopulmonary segments involved [10][11][12], compared to quantitative scores using computer-based approaches have been investigated. To date, several computer tools to segment the lung automatically using HRCT images have been developed [15,40,41]. They include image display (e.g., multiplanar reformations and surface shading for three-dimensional and volume rendering), anatomic image quantification (e.g., area and volume of airways and lungs) and regional characterization of lung tissue (analyzing attenuation, changes in attenuation, and texture patterns in the imaged lung) [15,[42][43][44]. Computer-based models correlate well with visual scoring techniques for the detection of fibrosis and with the assessment of extent of disease, without the intrareader variation encountered with visual scoring [15][16][17]. Recently, we showed that the CaM may assist the rheumatologist analysis of lung HRCT data and it provides an objective method for supplementing subjective visual-based grading of the extent of ILD to achieve precise and independent reader quantification [18].
On HRCT, 116 patients (94.3%) displayed findings of ILD. Our results showed, in agreement with other authors [45], that the percentage of fibrosis measured by CaM, was not significantly higher in patients with dcSSc. Patiwetwitoon et al. compared the HRCT findings between patients with dcSSc and lcSSc and found that the HRCT scores of these patients were comparable in both subtypes of SSc [45]. In the Scleroderma Lung Study, lcSSc and dcSSc patients were indistinguishable with regard to their baseline pulmonary functions, but lcSSc patients presented more extensive pulmonary fibrosis. Furthermore the rate of progression of ILD is similar in lcSSc and dcSSc patients [46]. In univariate analysis, we observed a strong correlation between percentage of pulmonary fibrosis on CaM and CoVR scores (p <0.0001). The HRCT CaM and CoVR scores were statistically related to severity of functional lung parameters. The percentage of pulmonary fibrosis on CaM showed a more significant negative correlation with FVC and the DLco. In multivariate analysis, the CaM and CoVR scores were predicted by DLco, FVC, Borg score, and HAQ-DI. Age, sex, disease duration, anti-topoisomerase I antibodies and mRSS were not significantly associated with percentage of pulmonary fibrosis on CaM and CoVR scores. These observations suggest that parenchymal lung disease in SSc-ILD may have a high impact on gas transfer and is consistent with prior studies [7,36,47]. In this respect, Tashkin et al. [37], using baseline data from two large, randomised, interventional studies (Scleroderma Lung Study I and II, or SLS I and II), found that DLco was the single variable that best correlated with the extent of both lung fibrosis and total ILD (more so than the FVC), when assessed both in the zone of maximal involvement and in the whole lung. Similarly, DLco was better correlated with the radiographic extent of lung fibrosis and total ILD than FVC% in bivariate analyses.
Change in DLco is influenced equally by the integrity of the alveolar-capillary interface, as well as ventilation (including alveolar volume), and perfusion (including hemoglobin). Thus, in SSc, where reduced volumes generally indicate the presence of ILD, impaired DLco is not specific and can indicate the presence, in varying degrees of ILD, pulmonary hypertension, and other disease manifestations, including anemia. In addition, DLco has marked measurement variability both within each testing session and between sessions.
Functional disability is considerable in SSc [1,3,26,27], and may be influenced by respiratory impairment and other factors, such as the extent of skin involvement, tendon/joint contracture, damage in the heart, and peripheral vascular system. The median HAQ-DI seen in our cohort was 0.84, which is in keeping with the reported range of this index in SSc patients of 0.83-1.2 [48,49]. We were able to show that patients with lcSSc had significantly less disability than those with dcSSc. Moreover, in apparent contrast to recent findings [47], we found that the HAQ-DI, correlated with PFT, subjective patient-oriented measures of dyspnea and HRCT scores, suggested that the HAQ-DI should be included as surrogate outcome measure of HRCT-defined severity of fibrosis in this population. In this respect, Volkmann et al., using data from the SLS-I [50], found that a composite measure comprised of variables included the FVC, computer-based score for quantitative lung fibrosis in the zone of maximum fibrosis from HRCT scans, transitional dyspnea index, and the HAQ-DI may serve as a more comprehensive measure of cyclophosphamide treatment effect in SSc-ILD compared with a single outcome approach (e.g., FVC predicted).
The most important limitations are as follows. First, and probably the most important, this is a cross-sectional study; therefore, it is unknown whether the observed relationships persist over time. Second, we did not include a respiratory disease-specific questionnaire, such as the St. George's Respiratory Questionnaire, which would have further strengthened our study [51]. Third, pulmonary hypertension could not be strictly excluded based on clinical findings or clinical history alone in patients with ILD-SSc. Another limitation is that using a data set from a single centre to select the regression models, may overestimate the model association performances when applied to other populations.
In conclusion, taken together, the results indicate that, although a close correlation between CaM score results and CoVR total score was found, CaM results showed a more significant correlation with DLco (more so than FVC), patient-centred measures of perceived dyspnea and functional disability in comparison to CoVR scores. A computer-based quantification system is computationally efficient and provides the best overall estimates of HRCT-measured lung disease. The visual-based scoring techniques offer undisputed advantage in finding type and extent of lung anatomical-functional damage caused by SSc, or in distinguishing other causes of increased lung density, such as infection or neoplasm. Visual and computer-based quantitative scoring system are complementary, rather than competitive [35]. In combination with physiologic parameters and patient-centred measures of perceived dyspnea and functional disability, a computer-based quantification system could be a means for accurately assessing and monitoring the disease progression or response to therapy [50]. In a future study, we will address the sensitivity of changes in whole lung fibrosis score over time, in the presence and absence of therapeutic intervention as a necessary validation step.

Disclosures
This work was not supported by any research grants. All the authors have contributed substantially to this study, and there are no conflicts of interests to be declared.