Interobserver Agreement of Thyroid Imaging Reporting and Data System (TIRADS) and Strain Elastography for the Assessment of Thyroid Nodules

Background Thyroid Imaging Reporting and Data System (TIRADS) was developed to improve patient management and cost-effectiveness by avoiding unnecessary fine needle aspiration biopsy (FNAB) in patients with thyroid nodules. However, its clinical use is still very limited. Strain elastography (SE) enables the determination of tissue elasticity and has shown promising results for the differentiation of thyroid nodules. Methods The aim of the present study was to evaluate the interobserver agreement (IA) of TIRADS developed by Horvath et al. and SE. Three blinded observers independently scored stored images of TIRADS and SE in 114 thyroid nodules (114 patients). Cytology and/or histology was available for all benign (n = 99) and histology for all malignant nodules (n = 15). Results The IA between the 3 observers was only fair for TIRADS categories 2–5 (Coheńs kappa = 0.27,p = 0.000001) and TIRADS categories 2/3 versus 4/5 (ck = 0.25,p = 0.0020). The IA was substantial for SE scores 1–4 (ck = 0.66,p<0.000001) and very good for SE scores 1/2 versus 3/4 (ck = 0.81,p<0.000001). 92–100% of patients with TIRADS-2 had benign lesions, while 28–42% with TIRADS-5 had malignant cytology/histology. The negative-predictive-value (NPV) was 92–100% for TIRADS using TIRADS-categories 4&5 and 96–98% for SE using score ES-3&4 for the diagnosis of malignancy, respectively. However, only 11–42% of nodules were in TIRADS-categories 2&3, as compared to 58–60% with ES-1&2. Conclusions IA of TIRADS developed by Horvath et al. is only fair. TIRADS and SE have high NPV for excluding malignancy in the diagnostic work-up of thyroid nodules.


Introduction
In regions with inadequate iodine supply thyroid nodules are a common finding and are reported in one third of unselected adults [1]. Ultrasound is an accurate method for the detection of thyroid nodules, but it has a low accuracy for the differentiation between benign and malignant thyroid nodules [2]. Therefore, fine-needleaspiration-biopsy (FNAB) is presently recommended as additional diagnostic method in the evaluation of thyroid nodules with a size of $10 mm in patients with normal thyroid stimulating hormone. In addition, FNAB is advised in nodules smaller than 10 mm with suspicious history or suspicious ultrasound findings [3][4][5][6]. Nevertheless, FNAB is known to have a high specificity (60-98%) but varying sensitivity (54-90%) for the diagnosis of malignant thyroid nodules [7][8][9][10]. Therefore a relevant number of patients with the final diagnosis of benign thyroid nodules receive thyroid surgery more for diagnostic than for therapeutic purposes.
Thyroid Imaging Reporting and Data System (TIRADS) has been developed based on the concepts of Breast Imaging Reporting and Data System (BIRADS), which established different categories according to the percentage of malignancy. BIRADS has become a worldwide accepted method that guides clinical management of breast lesions. The aim of TIRADS was to improve patient management and cost-effectiveness by avoiding unnecessary fine needle aspiration biopsy FNAB in patients with thyroid nodules. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) were 88%, 49%, 49%, and 88%, respectively. However, since its publication by Hovarth in JCEM [11] its clinical use is still very limited and its practicability in clinical practice is questioned.
A classical criterion of malignancy is a hard or firm consistency upon palpation or ultrasound-probe pressure [4,12]. Previously this attribute was subjective and dependent on the experience of the examiner. However, with the introduction of strain elastography (SE) a reproducible qualitative and semi-quantitative assessment of tissue consistency became available. A meta-analysis of SE reported a mean sensitivity and specificity for the diagnosis of malignant thyroid nodules of 92%, and 90%, respectively [13].
Nevertheless recently SE was challenged and criticized for its operator dependency [14].
The aim of the present study was to evaluate the interobserver agreement (IA) of TIRADS and SE for the assessment of thyroid nodules.

Materials and Methods
Ethics statement: Informed written consent was obtained from all patients and the study was performed in accordance with the ethical guidelines of the Helsinki Declaration and approved by the ethics committee of the medical faculty of the university of Frankfurt.
During the previous prospective comparative study of strain elastography and Acoustic Radiation Force Impulse imaging thyroid ultrasound and elastography images were stored [15]. Three blinded observers with at least 5 years experience in thyroid ultrasound independently scored the images of 114 nodules from 114 patients in regard to TIRADS classification and strain elastography. Inclusion criteria were the presence of a thyroid nodule $5 mm, normal values of thyroid-stimulating hormone, and FNAB of this nodule performed within the last 6-months or FNAB and/or surgery planned at the time of ultrasound examination and finally performed within the study period. Exclusion criteria were cystic lesions of completely liquid nature, no cytology by FNAB or histology by surgery of the thyroid nodule within the study period, indeterminate cytology by FNAB without repeated FNAB, and suspicious or malignant cytology by FNAB without thyroid operation within the study period. After excluding patients without adequate reference method (nondiagnostic aspirate on FNAB without repeated FNAB/surgery during the study-period; suspicious or malignant aspirate on FNAB without surgery during the study-period) stored images of B-mode, -and Doppler Ultrasound as well as SE-images were available from 114 of the overall 138 patients. The Bethesda system was used to report thyroid cytopathology.

Ultrasound Methods
Thyroid Imaging Reporting and Data System (TIRADS). Thyroid ultrasound images were generated using a 9-MHz transducer (Hitachi-EUB-900,Hitachi,Tokyo,Japan). The three observers independently scored the images according to the TIRADS classification [11]  The observers were blinded to the results of cytology and histology. The observers had at least 7 years experience in thyroid ultrasound and used TIRADS classification in clinical practice for at least 12 months prior to study inclusion.
Strain Tissue Elastography (SE). SE images were genereated using a 9-MHz transducer (Hitachi Strain Tissue Elastography, Hitachi-EUB-900, Hitachi Medical Cooperation, Tokyo, Japan). The probe was placed on the neck and a light pressure of 3-4 on a scale of 0-6 arbitrary units was applied for measurement. The region-of-interest (ROI) for the elastography examination was selected by the operator to include the nodule and surrounding normal thyroid tissue. In cases of cystic lesion, the solid component of the nodule was examined to exclude artifacts known to be caused by the cyst.
The three observers independently scored the images according to the following elasticity classification [16,17]: -elasticity score (ES)-1: the nodule is displayed homogeneously in green (soft) -ES-2: the nodule is displayed predominantly in green with few blue areas/spots -ES-3: the nodule is displayed predominantly in blue with few green areas/spots -ES-4: the nodule is displayed completely in blue (hard).
The observers were blinded to the results of cytology and histology. Examples are shown in Figure 1 and 2.

Statistical analysis
Statistical analysis was performed using BiAS-for-Windows (version-9.10,epsilon-2011,Frankfurt,Germany) and SigmaPlot and SigmaStat for Windows (version 11.0, Systat Software, Inc. Germany). Clinical and laboratory characteristics of patients were expressed as mean6SD, median and range. Correlations were assessed by Spearman's correlation coefficient. The interobserver agreement between the 3 observers was calculated using Cohens kappa coefficient. Hereby, a kappa value of 0 corresponds to no agreement and a kappa value of 1.0 to complete agreement. Kappa values between 0-0.20 indicate slight agreement, 0.41-0.60 indicate moderate agreement, 0.61-0.80 indicate substantial agreement and 0.81-1.00 indicate excellent agreement. Sensitivity, specificity, positive and negative predictive values, and positive likelihood ratio(LR) were calculated using ES-Score 1 & 2 and TIRADS 2 & 3 for benign classification, and ES-Score 3 & 4 and TIRADS 5 & 6 for malignant classification of thyroid nodules. All tests were two-sided and use a significance level of a = 5%. The diagnostic performance of TIRADS and SE was also assessed by receiver-operating-characteristic (ROC)-curves. The ROC-curve represents sensitivity versus 1-specificity for all possible cut-off values for prediction of the different fibrosis stages, respectively.

Results
Stored images of B-mode, -and Doppler Ultrasound as well as SE-images were available for analysis from 114 nodules of 114 patients seen between Aug. 2010 to Mar. 2012. The final diagnosis was thyroid cancer in 15 nodules/patients and benign thyroid nodules in 99 nodules/patients. Patient characteristics are shown in Table 1. All patients showed up with normal thyroid hormone values.

Comparison of TIRADS and SE
Correlations between TIRADS and SE for the three observers were 0.338 (p = 0.00025), 0.301 (p = 0.0012), and 0.257 (p = 0.0059), respectively. The interobserver agreement (Cohens kappa coefficient) between the 3 observers was higher for SE with 0.66 than for TIRADS with 0. 27

Discussion
The TIRADS classification used in the present study was primarily defined and evaluated in a study with 1097 nodules [11].  It defined 6 categories, of which TIRADS-1 is a normal thyroid gland without nodules, and TIRADS-6 is diagnosed by malignancy on FNAB. TIRADS 2-5 are based on different B-mode and Duplex ultrasound criteria. In this study with 1097 nodules [11] 100% of patients with thyroid nodules scored as TIRADS-2 had benign nodules, while 87% of nodules scored TIRADS-5 were malignant. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) for the diagnosis of malignant thyroid nodules were 88%, 49%, 49%, and 88%, respectively. The authors concluded that TIRADS can improve patient management and cost-effectiveness by avoiding unnecessary fine needle aspiration biopsy in patients with thyroid nodules. However, since its publication in JCEM [11] its clinical use is still very limited and its practicability in clinical practice is questioned. The definition of TIRADS categories is complex and clinical categorization is challenging. Therefore, in the present study we evaluated the interobserver agreement of TIRADS. Three blinded observers experienced in thyroid ultrasound scored the stored images of 114 nodules. The IA between the 3 observers was only fair for TIRADS categories 2-5 (Cohens kappa = 0.27,p = 0.000001) and TIRADS categories 2/3 versus 4/5 (ck = 0.25,p = 0.0020). This demonstrates the difficulties of categorizing according to TIRADS. The present study supported the high percentage of benign nodules classified as TIRADS-2 with 92-100%. However, the percentage of malignant nodules classified as TIRADS-5 was lower than in the primary study of Horvath et al. [11] with only 28-42% as compared to 87-90%. Nevertheless, the negative predictive value with 92-100% in the present study was comparable to the study of Horvath et al. [11] reporting a NPV of 88%. 11-42% of nodules were in TIRADS-categories 2&3 in the present study. Therefore, the aim of TIRADS to avoid FNABs or surgery could apply with a high NPV in 11-42% of nodules in the present study and in 35% of nodules in the study published by Horvath et al. [11].
Other TIRADS classifications have been published recently. Kwak et al. [18] published a TIRADS classification based on the number of suspicious ultrasound features. They reported an increase of risk of malignancy with increasing number of suspicious ultrasound features such as solid component, hypoechogenicity, irregular margins, microcalcifications, and taller-than wide shape. Park et al. [19] proposed an equation for predicting the probability of malignancy in thyroid nodules based on 12 ultrasound features. Russ et al. [20] prospectively evaluated 4550 nodules using a six-point scale and reported a NPV of 99.7% for TI-RADS gray-scale score and a good interobserver agreement.
In the present study we only evaluated the TIRADS classification published by Horvath et al. Therefore, no conclusion can be drawn concerning the other TIRADS classifications. Future studies are necessary to compare the different TIRADS classification systems in terms of interobserver agreement, practicability and validity to find the optimal system for clinical work up of thyroid nodules. SE has become a well evaluated clinical tool enabling the determination of tissue elasticity using ultrasound devices. SE is a qualitative elastography method evaluating changes in ultrasound pattern during strain and stress of direct or indirect tissue compression. In a meta-analysis on SE including 8 studies with overall 639 nodules a sensitivity of 92% and a specificity of 90% was reported for the diagnosis of malignant thyroid nodules [13]. Different methods of scoring SE have been evaluated, qualitative assessment using a 4-scale scoring system (as used in the present study), a 5-scale scoring system and semi-quantitative scoring using strain value, strain ratio and histograms of colour pixels [21][22][23]. Nevertheless, besides a lot of promising study results two recent studies have challenged the usefulness of SE in clinical practice by reporting no additional value as compared to qualified B-mode ultrasound [14,24,25]. In the present study the 4-scale qualitative elastography was evaluated. The interobserver agreement between the three observers was substantial for SE scores 1-4 (ck = 0.66, p,0.000001) and very good for SE scores 1&2 versus 3&4 (ck = 0.81, p,0.000001). Correlations for SE between the three observers were 0.824 (p,0.000001), 0.793 (p,0.000001), and 0.815 (p,0.000001). These results are in accordance with recently published studies reporting interobserver concordance of 0,64 and correlation coefficients between observers of 0.73-0.79 [26,27].
However, in a recently published meta-analysis the best results were obtained by combining SE with B-mode ultrasound criteria of malignancy [32].
A limitation of the present study was that prospectively stored images acquired in a prospective study were analyzed retrospectively. However, the stored images enabled the independent blinded evaluation of images by three experience sonographeurs. We would expect even lower interobserver agreement in realtime examination during which the images might vary. Another limitation is the high percentage (13%) of carcinoma in the present study. However, this is a general limitation of most studies performed at endocrinology centers with an average of even 30% of malignant thyroid nodules [13].
In summary, the TIRADS developed by Horvath et al. and SE have shown excellent NPV for the diagnosis of malignant thyroid nodules in the present study. However, while good interobserver agreement was found using SE, this was only fair using the