Development of a Simple Reliable Radiographic Scoring System to Aid the Diagnosis of Pulmonary Tuberculosis

Rationale Chest radiography is sometimes the only method available for investigating patients with possible pulmonary tuberculosis (PTB) with negative sputum smears. However, interpretation of chest radiographs in this context lacks specificity for PTB, is subjective and is neither standardized nor reproducible. Efforts to improve the interpretation of chest radiography are warranted. Objectives To develop a scoring system to aid the diagnosis of PTB, using features recorded with the Chest Radiograph Reading and Recording System (CRRS). Methods Chest radiographs of outpatients with possible PTB, recruited over 3 years at clinics in South Africa were read by two independent readers using the CRRS method. Multivariate analysis was used to identify features significantly associated with culture-positive PTB. These were weighted and used to generate a score. Results 473 patients were included in the analysis. Large upper lobe opacities, cavities, unilateral pleural effusion and adenopathy were significantly associated with PTB, had high inter-reader reliability, and received 2, 2, 1 and 2 points, respectively in the final score. Using a cut-off of 2, scores below this threshold had a high negative predictive value (91.5%, 95%CI 87.1,94.7), but low positive predictive value (49.4%, 95%CI 42.9,55.9). Among the 382 TB suspects with negative sputum smears, 229 patients had scores <2; the score correctly ruled out active PTB in 214 of these patients (NPV 93.4%; 95%CI 89.4,96.3). The score had a suboptimal negative predictive value in HIV-infected patients (NPV 86.4, 95% CI 75,94). Conclusions The proposed scoring system is simple, and reliably ruled out active PTB in smear-negative HIV-uninfected patients, thus potentially reducing the need for further tests in high burden settings. Validation studies are now required.


Introduction
Despite the fact that tuberculosis (TB) is curable, it remains a major problem globally [1]. Central to tuberculosis control programmes is the identification of sputum-positive patients. But smear microscopy has a sensitivity less than 50% among patients with active PTB who are co-infected with HIV [2], and in the HIV era, especially in some countries where more than 70% of patients are HIV-positive, additional methods are required for identifying patients requiring treatment [3]. While much work has been done to optimize sputum microscopy using strategies such as lightemitting diodes [2] and same-day diagnosis [4], in most clinical settings, the chest radiograph remains a central component of the diagnostic work-up, and has not been displaced by recently developed point of care tests, which are limited by both cost and availability [5,6].
Chest radiography also has significant limitations, particularly when used in the field. In order to be helpful, chest radiographs require both observation and interpretation. Both are subjective, and subject to wide intra-and inter-observer variation [7]. Observation may be improved by a system such as the Chest Radiograph Reading and Reporting System (CRRS) proposed by White et al [8][9][10], which ensures systematic recording of features. Interpretation requires knowledge of the appearances and natural history of pulmonary tuberculosis in both HIV infected as well as HIV-negative persons, and agreement between readers often remains poor [7]. Interpretation and utility can potentially be improved by a scoring system that makes the diagnostic decision less arbitrary by prescribing what constitutes a finding, and assigns weights to the features observed on the chest radiograph based on the likelihood of the association of such features with PTB.
Most studies assessing the diagnostic utility of chest radiography have compared the probability of PTB between readers, usually qualified radiologists. In our review of the literature, it was evident that certain features noted on a chest radiograph, such as apical infiltrates and cavities, are known to be highly suggestive of active PTB [11][12][13][14][15][16][17][18][19][20][21][22][23][24]. However, we did not find published reports of a simple scoring system that combines the systematic assessment of radiographs for features shown to be relevant to the diagnosis of active pulmonary tuberculosis and that may be suitable for use in the clinic by trained health personnel.
The CRRS was developed to standardize the reading of chest radiographs in epidemiological studies of PTB and lung disease [10], with improved inter-and intra-reader reliability [9,10]. Although validated for use in epidemiologic studies, its clinical application for the diagnosis of active pulmonary tuberculosis has not been studied. Our study set out to develop a weighted radiographic scoring system to assist interpretation of chest radiographic changes and aid the diagnosis of active pulmonary tuberculosis in the clinical setting.

Study Subjects and Data Collection
The study cohort comprised patients forming part of a large prospective study (TB-NEAT) being conducted at the University of Cape Town (Cape Town, South Africa) to evaluate the performance of new diagnostic tests for tuberculosis [6,25,26]. The prevalence of TB in South Africa is estimated to be 795 per 100,000 population and the incidence, 981 per 100,000 population, while the prevalence of HIV infection is 178 per 1000 among adults between the ages of 15 and 49 [27,28].
Subjects qualified for inclusion in the study if they were $18 years and considered by the clinic staff to be patients with possible PTB. To qualify as a patient with possible PTB, an individual had to present to the hospital with at least two of the following symptoms if HIV negative, and one if HIV infected: cough for $2 weeks, haemoptysis, fatigue, night sweats, fever for $2 weeks, weight loss, loss of appetite, or being bedridden. After giving written informed consent, all patients underwent diagnostic testing, which included two sputum samples evaluated by concentrated smear microscopy, two sputum cultures using the MGIT 960 liquid culture system (BD Diagnostic Systems, Sparks, MD, USA), chest radiography, standardized interferon-gamma release assays, HIV testing, and CD4 T cell count for those who were HIV-infected. Epidemiological data were captured in a questionnaire, which was administered by trained interviewers to all patients.

CRRS Training
The CRRS training course is held bi-annually at the University of Cape Town Lung Institute (See http://www.lunginstitute.co. za/content/talks.html). The course involves a two and a half-day programme of interactive training using standard chest radiographs. On the first day of the course attendees are instructed on chest anatomy and disease presentation, and a standardized approach to identifying radiological abnormalities is introduced. On the second day attendees read archived radiographs using the structured CRRS form (see online supplement for sample form) and consolidate their understanding about the detail required for standardized reporting. On the third day an examination using 24 standardized radiographs is undertaken and trainees are awarded either ''A'' or ''B grade'' accreditation based on their interpretation of an examination set of radiographs.

Reading of the Chest Radiographs
Chest radiographs were read by two independent readers (BA and GC, specialist physicians undergoing training in pulmonology in the Division of Pulmonology) who had received standard training in the CRRS method, and were blinded to clinical information. Their findings were recorded on a computerized form. The CRRS involves the use of a systematic checklist ( Figure  S1-online appendix) that details abnormal features visualized on a chest radiograph. These abnormalities are broadly classified into parenchymal, pleural, central and other abnormalities, each of which is further sub-categorized (for example, parenchymal abnormalities are sub-categorized into large opacities, small opacities and cavities). At the conclusion of the examination, the reader is required to provide a subjective assessment of whether the abnormalities recorded are consistent with active TB. In this study, disagreements between readers were resolved through a consensus read by a third senior reader (RVZS, a faculty pulmonologist trained in the use of CRRS). The scoring system was developed using the final single consensus read. Only radiographs performed within 3 months after each subject was enrolled were evaluated.
The outcome of interest was the presence of active PTB, defined as the growth of M.tuberculosis on at least one sputum culture. Patients with two negative cultures were classified as having a final culture-negative result. Similarly, a patient with two negative sputum smears was classified as having a smear-negative status.
A univariate analysis was performed to identify significant associations (with a liberal threshold of P,0.2 for statistical significance) between the pre-defined radiographic and clinical features and the outcome of interest. Chi-square tests were used for categorical radiographic variables and t-tests were used for continuous variables.
Variables found to be significant in the univariate analysis, or which were identified a priori by the literature review were entered into a multivariate logistic regression model. Factors found to be independent predictors of the outcome (P,0.05) were selected for the final model, and a stepwise backward elimination process was employed, using the likelihood ratio test [29], to eliminate variables that did not significantly contribute to the model. We adjusted for HIV status in the final model, as HIV is known to alter the radiographic presentation of active PTB.
We assigned scores to each radiographic feature found to be an independent predictor of outcome in the final model, weighted according to the beta-coefficients from the final multivariate logistic model. Weights were rounded up to the nearest integer.

Data Analysis
The various major criteria in the CRRS were analyzed for inter-reader reliability between the two initial readers and a kappa statistic for inter-reader agreement was calculated and graded [30,31]. Based on the weights assigned to the four radiographic features found to be significantly associated with active PTB, we calculated a total score for each patient's radiograph and analyzed the performance characteristics of the score at various cut-points for the diagnosis of culture-confirmed active PTB. We also analyzed the performance of the score in the subset of patients with smear-negative PTB, and among patients who were HIVinfected. Data were analyzed using STATA version 11.0 (Stata Corp, College Station, Texas, USA).

Demographic Characteristics of Subjects
Of 645 patients recruited into the parent study, 473 patients were included in the final analysis. As outlined in Figure 1, the major reasons for exclusion were inability to produce sputum, contaminated sputum culture, missing chest radiographs, and a chest radiograph read by only one reader. There were no significant differences in the demographic features of the patients who were included and those who were not (

Reliability of the CRRS
The inter-reader reliability of the CRRS for various a priori major radiographic features of active PTB is summarized in Table 2. The kappa-statistic ranged from moderate (0.56) for small opacities to substantial (0.77) for pleural effusions. The kappastatistic for the overall judgment on whether the reader considered the features of the chest radiograph to be consistent with active PTB was 0.52 (95% CI 0.42,0.62).

Development of the Score
Results of the univariate analysis of the various chosen radiographic and clinical criteria are summarized in Table 3. All the variables initially selected for inclusion in the multivariable model were found significant, and were therefore retained in the final analysis. The final model was adjusted for age and HIV status, but these were not included in the score, as the aim was to develop a score based only on radiographic features. Based on the beta-coefficients of the variables in the multivariable logistic regression, scores were assigned to the individual radiographic features. The results of the multivariate analysis and scores assigned to the variables in the final model are summarized in Table 3.

Performance of the Score
The score thus developed was tested at different cut-offs, the results of which are shown in Table 4. At a cut-off of $2, the score had a high negative predictive value (91.5%, 95%CI 87.1-94.7), and misclassified 20 of 138 patients with active PTB. The score improved the specificity of the test at this cut-off (63.9%, 95% CI 58.5-69) as compared to the specificity of the subjective assessment of the probability of PTB by the readers (27.5%, 95% CI 22.6,32.8) with a loss in sensitivity that was not statistically significant (85.5, 95% CI 78.5,90.9 v/s 93.4, 95% CI 87.9,97) ( Table 4). The positive likelihood ratio (LR+) for the test at this cut-off was 2.37 and the negative likelihood ratio (LR-) was 0.23. The gain in specificity at higher cut-offs for the score was accompanied by appreciable losses in sensitivity.
In sputum-smear negative patients, at the same cut-off, the test had a good rule-out value (NPV 93.4%, 95% CI 89.4,96.3). 214 of 382 smear-negative patients were correctly classified by the score as not having active disease, and 15 smear-negative patients with the disease were incorrectly classified by the score. The performance of the score in smear-negative PTB patients is shown in Table 5. The score had a better negative predictive value for HIV-uninfected individuals (92.1, 95% CI 86.3, 96) than in HIVinfected individuals (86.4, 95% CI 75,94) ( Table 6), although the difference was not statistically significant (p = 0.21).

Discussion
We have developed a scoring system for chest radiographs for use in patients being investigated for active PTB, that performs satisfactorily as a rule-out test in both smear-positive and smearnegative patients. Although the score was developed using radiographic features reported by trained clinicians employing a validated method for reporting on chest radiographs (the CRRS system), the score is based on 4 easily recognized features -the presence of upper lobe opacities, cavities, a unilateral pleural effusion, and hilar or mediastinal lymphadenopathy, features that have been consistently reported in the literature to be associated with active PTB [11][12][13][14][15][16][17][18][19][20][21][22][23][24]. While not useful for confirming the presence of active PTB, the scoring system may at least be suitable for ruling out active disease and reduce the use of more expensive diagnostic tests. Since it is not intended to replace, but to supplement the use of sputum smear examination, it is reassuring that it performs satisfactory in smear-negative and in patients infected with HIV, in whom active PTB is often associated with a negative smears. This potential application needs to be examined in a prospective study performed in the field setting.
The proposed score has limitations. Firstly, a rule-out test is less useful to clinicians than one that confirms active PTB. Its primary purpose is to reduce the need for further confirmatory tests and/or referral for further examination, which can be useful in resource poor settings, particularly where access to clinics and investigations is limited. In such circumstances, saving patients the need to visit distant facilities may be valuable. The current score differs from other diagnostic scoring systems developed to assist in the diagnosis of active PTB, in that it does not include clinical data. Most existing methods include the presence or absence of symptoms and clinical features of PTB, with a radiographic score [11][12][13][14][15][16][17][18][19][20][21][22][23][24]. A recent systematic review of such scoring systems showed that eleven of the thirteen studies found were used for making decisions concerning hospital respiratory isolation, and only one study assessed a scoring system for out-patients. All of the scores were dependent on clinical features in patients, and were found to be very sensitive, but poorly specific [32]. Although their performance in diagnosing active disease is useful, a limitation is that the clinicians collecting the clinical data might not feel confident in evaluating the chest radiograph. With our method, a health worker trained in the CRRS methodology may be better able to provide a report to the attending clinician that will influence the clinical decision in sputum negative patients, on whether to further evaluate a patient with more diagnostic tests, or to follow-up the patient clinically. It is recognized however that in our study patients were selected on Table 3. Analysis of radiographic and clinical features in the univariate and multivariable logistic regression model, and weights assigned in the final radiographic score in 473 patients (n = 138 with PTB and 335 without active PTB).  the basis of one or more symptoms or features consistent with the diagnosis of active PTB. Thus the pretest probability of a positive diagnosis was increased. However, this reflects the setting in which the test might be used in clinical practice. A further limitation of the method, is that its negative predictive value in HIV-infected patients (86.4, 95% CI 75-94), may not be sufficiently high in this highly vulnerable group, in whom delayed diagnosis is associated with rapid progression of disease and death [33]. This imprecision of the negative predictive value may be a reflection of the small sample size. It is also possible that the insufficiently high NPV is a reflection of the reported fact that a significant proportion (over 10%) of HIV-infected individuals with clinical symptoms consistent with PTB can have normal chest radiographs [34]. Further studies will have to be conducted to assess the utility of the scoring system in this category of patients.
The finding of high inter-reader reliability for the major features among readers trained to report radiographs using the CRRS is consistent with earlier reports [8][9][10]. The high reliability is useful, given that standardizing the reading of chest radiographs for PTB and increasing the reproducibility has always been an impediment to the accuracy of the test. Our study supports the role of the CRRS system, which was designed as a tool for epidemiological surveys, as an aid to clinical decision-making. The work reported here was based on reports generated by trained readers with considerable clinical experience in chest radiology. They had received the 2 and a half-day intensive training required for CRRS accreditation. Whether this simplified clinical score will perform as well when radiographs are read by non course-trained observers needs to be tested in prospective studies. Even a simplified score will require some degree of training and standardization before it can be widely implemented.
Our study has several limitations. Due to logistical difficulties, 8 patients (1.7%) had chest radiographs that had discordant readings that were not resolved by a third reader, and we had to exclude 5 of these patients who did not have radiographs reported by the third reader for analysis. Secondly, several patients had to be excluded from the analysis, primarily due to the absence of chest radiographs, and this might have introduced bias. However, the comparisons of the demographic features of the patients included and excluded suggest that this was unlikely. Thirdly, we tested the score in the same population from which the score was derived, and this could lead to an overestimation of its performance. However, the consistency of the features used in the score with what is described in the literature suggests that these features are likely to be reproducible. Despite the high negative predictive value of the score, misclassification (false negatives) occurred in 20 patients with active disease. This highlights the principle that all diagnostic tests including the chest radiograph must be interpreted within the clinical context of the case, and appropriate advice given and follow-up arranged for patients who have progressive or ongoing symptoms. Finally, we acknowledge the fact that performance characteristics of diagnostic tests are influenced by prevalence of disease, and the external validity of the scoring system can only be established after using it in settings with different burdens of TB and HIV, and the consequent varying radiographic presentations of the disease.
In conclusion, we have developed a scoring system that attempts to optimize the observation and interpretation of chest radiographs for the diagnosis of PTB. The system uses the CRRS, a tool with high inter-reader reproducibility, for observing and documenting the abnormalities visualized on the chest radiograph. For interpretation of these abnormalities, we have developed a simplified score based on assigned weights to four easily recognized features on the chest radiograph. The system thus developed has a high negative predictive value, making it a useful tool to rule out active PTB in persons with negative sputum smears, especially in patients who are not infected with HIV. Further validation studies are now necessary to confirm our findings. Supporting Information