Are clinically unimportant findings qualified as benign in lumbar spine imaging reports? A content analysis of plain X-ray, CT and MRI reports

Background Lumbar spine diagnostic imaging reports may cause patient and clinician concern when clinically unimportant findings are not explicitly described as benign. Our primary aim was to determine the frequency that common, benign findings are reported in lumbar spine plain X-ray, computed tomography (CT) and magnetic resonance imaging (MRI) reports as either normal for age or likely clinically unimportant. Methods We obtained 600 random de-identified adult lumbar spine imaging reports (200 X-ray, 200 CT and 200 MRI) from a large radiology provider. Only reports requested for low back pain were included. From the report text, one author extracted each finding (e.g., ‘broad-based posterior disc bulge’) and whether it was present or absent (e.g., no disc bulge) until data saturation was reached, pre-defined as a minimum of 50 reports and no new/similar findings in the last ten reports within each imaging modality. Two authors independently judged whether each finding was likely clinically unimportant or important. For each likely clinicially unimportant finding they also determined if it had been explicitly reported to be benign (expressed as normal, normal for age, benign, clinically unimportant or non-significant). Results Data saturation was reached after coding 262 reports (80 X-ray, 82 CT, 100 MRI). Across all reports we extracted 3,598 findings. Nearly all reports included at least one clinically unimportant finding (76/80 (95%) X-ray, 80/82 (98%) CT, 99/100 (99%) MRI). Over half of the findings (n = 2,062, 57%; 272 X-Ray, 667 CT, 1123 MRI) were judged likely clinically unimportant. Most likely clinically unimportant findings (90%, n = 1,854) were reported to be present on imaging (rather than absent) and of those only 18% (n = 331) (89 (35%) X-ray, 93 (16%) CT and 149 (15%) MRI) were explicitly reported as benign. Conclusion Lumbar spine imaging reports frequently include findings unlikely to be clinically important without explicitly qualifying that they are benign.


Introduction
Degenerative changes, including disc bulges and facet joint degeneration, are common findings described in lumbar spine imaging reports [1][2][3].These changes are increasingly prevalent with age and are equally as common in people with and without low back pain [1].Longitudinal population-based studies have confirmed that such degenerative changes do not have clinically important associations with either current or future back pain, even when multiple changes are present [4,5].Reporting on the presence of degenerative changes in imaging reports without clarifying that they are likely clinically unimportant has the potential to lead to overdiagnosis and overtreatment, and cause patient anxiety about the seriousness and persistence of symptoms [6][7][8][9].
There is a paucity of guidance for radiologists on how to communicate findings of limited clinical relevance in a manner that does not alarm the reader.A scoping review that included six radiology reporting guidelines found that three of the guidelines recommended reporting the presence of normal findings, although there was limited advice about how this should be actioned [10].For example, the Royal Australian and New Zealand College of Radiologists (RANZCR) guideline suggests that normal findings be noted when it would make a difference to the referrer or when absence of such a statement would create ambiguity [11].
In addition, only three guidelines provide guidance on communicating confidence or certainty in reports.The UK College of Radiologists guideline recommends that the level of certainty or doubt surrounding an imaging diagnosis be clearly documented in the report [12], the Canadian College of Radiologists guideline suggests the focus should be on findings that offer potential for resolution of the clinical question [13], while the RANZCR guideline recommends avoiding use of vague modifiers, such as 'possibly represents' [11].These recommendations are in keeping with other literature that recommends avoiding ambigious statements or hedging vocabulary, such as 'there appears to be. ..', to minimise confusion for the reader.Using hedging vocabulary such as 'seen' or 'identified' is discouraged as it suggests that something may have been missed; for example 'no fracture seen' is a less certain statement than 'no fracture' [14,15].
The choice of phrasing in imaging reports can influence management decisions.For example a US study providing hypothetical scenarios to clinicians found they were more likely to request further imaging if an incident 5mm liver lesion was described as 'most likely a cyst' compared to being a 'benign cyst (46% and 2% respectively) [16].Providing reassuring statements, such as 'findings are normal for age', or avoiding alarming descriptors, such as 'degeneration', 'tear' or 'rupture', have been suggested as possible ways to reduce misinterpretation of clinically unimportant findings in lumbar spine imaging reports [6,17].
One previous study has investigated the extent to which degenerative changes are reported in lumbar spine imaging reports and how they are described [2].Based upon examination of 120 consecutive plain X-ray reports requested in primary care, they found that almost three quarters noted the presence of degenerative changes.Only 2% of reports explicitly stated these were normal for age, while 14% indicated the changes were either 'mild' or 'slight', which may be a less explicit way of indicating to the reader that a particular finding is of limited clinical relevance.Another study performed a content analysis of plain X-Ray and MRI imaging reports of patients with persistent low back pain and explored, through interviews, which terms negatively impacted the patients' perceived prognosis [18].The terms 'wear and tear' and 'disc space loss' were associated with a significantly worse perceived outcome based upon patients' interpretation that these terms signified the spine was 'deteriorating', 'crumbling', 'collapsing' and/or the discs were 'wearing out'.
The primary aims of this study were to determine (a) the frequency that likely clinically unimportant findings are reported in lumbar spine plain X-ray, CT and MRI reports, and (b) the frequency that they are explicitly reported to be benign (i.e., normal, normal for age, benign, clinically unimportant or non-significant).Second, we investigated the frequency of adjectives (e.g., mild, severe) used to describe these findings and how frequently terms of uncertainty (i.e., vague modifiers, hedging vocabulary) were used.

Study design
We performed a content analysis of a random sample of fully de-identified lumbar spine plain X-ray, CT and MRI imaging reports from iMed, a large radiology service provider in Victoria, Australia.
A random sample of 600 (200 X-ray, 200 CT and 200 MRI) reports written between 1 January 2019 and 30 June 2021 were collected in July 2021 and this study was conducted over the following year.To obtain the random sample for each modality, we used the 'Rand' function in Excel to identify random dates within this time period for each imaging modality.If more than one report was identified for a selected day, we again used the Rand' function in Excel to select another report at random.For days without a report, we used the next randomised date.
A research assistant, not otherwise involved in the study, extracted the complete text of the identified X-ray, CT and MRI reports, including the patient sex, date of birth, date of imaging examination, requesting clinician specialty (e.g., GP, orthopaedic surgeon, rheumatologist), where available, and reporting radiologist using a standardised MS Excel data collection form.To ensure anonymity of the reporting radiologist a unique numerical code was assigned for each radiologist.The de-identified extracted reports were then provided to the research team.No data that could identify patients, referrers or radiologists were provided.

Eligibility criteria
We included lumbar spine imaging reports for people of any age that indicated that the imaging had been requested for low back or radicular lower limb pain.Reports that covered multiple body regions (e.g., thoracolumbar spine) were included only if the report of the lumbar spine could be clearly separated from reporting on other body regions.We excluded reports of imaging performed following major trauma and those requested to explicitly rule in/out serious causes (i.e.infection, malignancy, fracture), imaging post surgery or imaging performed for monitoring purposes.We also excluded any report that included a serious finding (e.g., vertebral fracture or metastatic disease) regardless of whether the clinical notes queried the presence of such a finding.

Data extraction
From the text of the reports relevant to the lumbar spine, one author (CF), a physiotherapist with expertise in low back pain, extracted each individual finding including any adjectives describing the finding (e.g., 'mild posterior disc bulge').This was continued until data saturation was reached, pre-defined as no new/similar findings in the last ten reports within each imaging modality.
The same author also extracted each term of uncertainty, including vague modifiers and hedging vocabulary based, a priori, on published lists of these terms [14][15][16]19], and consensus among the authors for additional terms.We grouped terms of uncertainty that had similar meaning together (e.g., 'not shown', 'identified' and 'seen').A second author (JW or RH, both also physiotherapists with expertise in low back pain) checked all extracted data and differences were resolved by consensus.
For each individual finding, two authors (CF, JW or RH) independently determined whether the finding was likely clinically unimportant or important, based, a priori, on published evidence about the relevance of imaging findings, the report context, and/or author team (also included rheumatologist with low back pain expertise (RB) and occupational therapist (DOC)) consensus for equivocal findings.S1 Table provides the list of likely clinically unimportant findings based upon the published evidence.We grouped findings based on anatomical structure (i.e., disc, facet joint, etc.,) and pathology present (e.g., bulge, arthropathy).Findings that described the same or similar abnormality (e.g., 'disc height loss' and 'disc space narrowing') were grouped together.
For findings that are usually considered clinically unimportant (e.g., 'disc protrusion'), if there was evidence within the context of the report of its potential importance (e.g., 'compressing a nerve root'), or if the clinical importance of a finding was ambiguous (e.g., 'mild to moderate canal stenosis'), we erred on being conservative and categorised the finding as likely clinically important.When a clinically unimportant finding was reported to be absent (e.g., 'no disc bulge') we recorded that separately.
For each likely clinically unimportant finding the same two independent authors (CF and RH or JW) recorded whether there was an explicit qualification that the finding was benign.This could have been stated as 'normal', 'normal for age', 'benign', 'clinically unimportant', 'non-significant' and/or other related synonyms (e.g., 'normal alignment' or 'alignment satisfactory').Any disagreements were resolved by discussion with all authors.

Sample size and data saturation
The sample size was informed by previous content analyses of lumbar spine imaging reports [2,18,20], and the data saturation stopping rule described by Francis et al. [21].A pilot study investigating the content of lumbar spine imaging performed in patients presenting with back pain to an emergency department of one metropolitan hospital in Victoria, Australia, indicated that a minimum of 50 reports, and likely less than 100 reports, of each modality would be needed [20].
We identified 200 reports for each modality to allow for the various terms known to describe similar radiological findings [22], as well as potential exclusions.We extracted individual findings until data saturation had been reached, defined as a minimum of 50 reports and the point at which no new findings were identified among the last ten reports, coded separately for each imaging modality.Cumulative frequency tables listed each new term until this stopping rule was met.

Data analysis
We used descriptive statistics to summarise the demographic characteristics of patients that were imaged, reporting radiologists and imaging referrers.We also measured and reported the report word count, median number of findings per report and median number categorised as likely clinically unimportant or important, the proportion of imaging reports with at least one likely clinically unimportant finding, the proportion that qualified clinically unimportant findings as benign, adjectives used to describe the findings, and terms of uncertainty.The most common likely clinically unimportant findings and most common adjectives and terms of uncertainty were determined for each imaging modality.

Ethics
This study was approved by the Monash University Human Research Ethics Committee (Approval ID 27959).Individual participant consent was not required due to the de-identified nature of the data.

Results
Fig 1 presents a flow chart of the report coding process and a summary of the main findings.Data saturation was reached after coding 262 reports (80 X-ray, 82 CT and 100 MRI).Ninetyfive reports (17 X-ray, 66 CT and 12 MRI) were excluded for reasons listed in Fig 1, most commonly because the imaging was performed due to trauma.3598 separate findings were extracted (454, 1139 and 2005 in the X-ray, CT and MRI reports respectively).Of these, 2062 (57%) (272 in X-ray (60%), 667 in CT (59%) and 1123 in MRI (56%)) were judged to be likely clinically unimportant.Most (n = 1854, 90%) were reported to be present rather than reported to be absent, and only a minority (n = 331, 18%) were explicitly reported to be benign.
Patient demographics, requesting clinician and radiologist details and imaging report characteristics by imaging modality are shown in Table 1.There were more women across all three imaging modalities and the median (range) age varied from 50 (15 to 91) years for MRI to 65 (17 to 98) years for CT reports.Most imaging was requested by GPs (50/80 (63%) X-rays, 49/ 82 (60%) CTs and 57/100 (57%) MRIs).One hundred and five different radiologists reported on the imaging with over 50 different radiologists for each imaging type (52 X-rays, 56 CT scans and 53 MRI scans).Only 10 radiologists reported imaging for all three imaging modalities and 39 contributed only a single report across all modalities.The median number of reports written by each radiologist was one (range for X-ray: 1 to 5, CT: 1 to 3 and MRI 1 to 6)).

Likely clinically unimportant findings, their frequency, and proportion reported to be benign
S1 Table indicates which clinically unimportant findings appeared in at least one report by imaging modality.The most common likely clinically unimportant findings that were reported to be present are shown by modality and in order of frequency in Table 2, together with the proportion that were explicitly reported as benign.Allowing for different sensitivities between imaging modalities, changes to the discs (e.g., disc height loss, disc bulge), facet joint arthropathy and degenerative changes were reported most commonly.Across all modalities these findings were reported to be benign in less than half of the reports that noted their presence.For example, disc height loss was reported to be present in 54%, 54% and 31% of X-Ray, CT and MRI reports respectively, and this change was reported to be benign in only 42%, 26% and 33% of those reports respectively.Alignment of segments of the spine, when described (39% X-ray, 40% CT and 46% MRI reports), was most frequently reported to be benign (100% X-Ray, 82% CT and 87% MRI reports that reported the finding).
Likely clinically unimportant findings that were reported to be absent (e.g., 'no disc bulge') were in keeping with those that were reported to be present (S2 Table ).S3 Table shows the findings that were considered likely clinically important by modality and in order of frequency.The most common finding for x-ray was fracture, for CT it was canal stenosis and for MRI it was foraminal stenosis.

Descriptors of likely clinically unimportant findings and their frequency
Overall, there were 50 different descriptors or groupings of descriptors used to describe likely clinically unimportant findings.The most frequent descriptors, used at least 10 times, overall and by imaging modality are shown in Fig 2 .The most common descriptors were used to indicate the severity of the findings (e.g., mild, moderate, severe), and many, shown on the left of the vertical line, indicated a finding was minimal, minor, small or mild.Other commonly used descriptors were 'degenerative' and 'broadbased'.

Terms of uncertainty and their frequency
Table 3 shows the 22 most frequent groupings of terms of uncertainty.Despite the inclusion of more MRI reports, five groups of uncertain terms contained more terms in the CT modality than either MRI or X-ray, including the most common, 'not shown' (221 occurrences overall; X-ray n = 35, CT n = 104, MRI n = 82).Other groups with more occurrences in CT than X-ray or MRI were 'appear' (53 occurrences overall; X-ray n = 12, CT n = 24 and MRI n = 17), 'may explicitly reported to be benign.Descriptors such as minimal, minor, small, mild and degenerative were common and likely intended to convey a lack of clinical relevanceVague modifiers and hedging vocabulary were also common across reports.There was variation in the proportion of each finding reported as benign, for example, 'lumbar spine alignment' was commonly reported as normal while some terms such as spondylolisthesis were not reported as benign in any report where it was present.Terms such as spondylolisthesis are often considered 'pathological' [2] which could explain why there were less likely to be explicity identified as benign.Our findings are in keeping with the single previous study that has investigated the content of lumbar spine X-ray reports [2].We found that these issues also apply to complex imaging (CT and MRI)reports.
Previous studies have indicated that many commonly reported, likely clinically unimportant findings, such as disc bulge and disc or facet joint degeneration, are likely to be misconstrued by patients [9,18,23].An online survey designed to elicit consumer understanding of terms commonly used in lumbar spine imaging reports found these terms were deemed to be serious and likely to make the majority of respondents concerned about persistence of pain [9].Participants in another online survey also reported lower expectations of recovery and higher perceived seriousness and need for surgery if they were deemed to have a'disc bulge' or 'degeneration' as the cause of their pain versus other diagnostic labels such as an 'episode of back pain' or 'lumbar sprain' [23].
General practitioners also want clear explanations for terms found in lumbar spine imaging reports, including their clinical relevance [24].In another study that replaced terms considered to increase patient concern with alternative less concerning terms were interpreted as showing less severe disease by general orthopaedic surgeons, orthopaedic residents and physiotherapists but not spinal surgeons [6].Similarly, for all groups except spinal surgeons there was also a trend away from recommendations for invasive treatments such as injections and surgery, and a lower perceived likelihood of the patient requiring surgical intervention.

Strengths and limitations
We performed a content analysis of a random sample of lumbar spine plain X-ray, CT and MRI imaging reports obtained from a large radiology service provider in Victoria, Australia, and extracted all findings until data saturation was reached for each imaging modality.It is therefore likely that our results are generalisable to other providers and settings in Australia.Decisions regarding whether findings were likely clinically important or unimportant were made independently by two authors, based a priori on published evidence and report context.Equivocal findings and differences of opinion between the author team were resolved by discussion.These data are so that readers can make their own judgments about the validity of our decisions.Similarly two authors also independently determined whether or not each finding was reported to be benign and disagreements were resolved by discussion with all authors.
We were intentionally conservative in how we categorised the likely clinical importance of imaging findings.For example, we categorised moderate to severe canal stenosis as likely clinically important as this is widely accepted in clinical practice [25].However a recent study found that, even in combination with other degenerative findings, canal stenosis may not have a clinically important association with low back pain [4].It is also possible that some findings we coded as likely clinically unimportant were also miscoded.
The reports were extracted in full from an electronic database but we did not have access to the imaging requests other than what was included in the reports.While the clinical notes from requests are usually included in the reports in our setting, we cannot exclude the possibility that the referring clinician provided further information by other means.We were also unable to verify the report findings, or their adherence to standard lumbar spine reporting nomenclature [26], as we did not have access to the images themselves.

Implications for practice
Although imaging is only one part of a comprehensive clinical assessment, primary care clinicians can find it difficult to understand the terminology in reports and assess the clinical relevance of findings [24].This may lead to misinterpretation of the findings by both referring clinicians and patients and result in unwarranted anxiety, more complex imaging, overdiagnosis and unnecessary treatment.As patients are increasingly able to directly access their imaging findings, it becomes even more imperative for radiologists to consider how to report imaging findings in a way that minimises misinterpretation and uncertainty about the relevance of the reported findings.

Implications for research
Further research is needed to determine the most effective and comprehensible methods for reporting lumbar spine imaging findings in people with low back pain.Co-design with relevant stakeholders including radiologists, clinicians and patients of a standard reporting template, that appropriately considers the importance of findings, how they are described and how certainty is qualified, is one possible approach.The template could also provide guidance about where it might be appropriate to use vague modifiers or hedging terms [16].Evaluation of such a template could consider whether the findings are understood by clinicians and patients as intended and whether it improves the quality of care compared with usual reporting.Similar to pathology reports that provide normal ranges that can vary by age, imaging reports could also include a 'reference range' of findings that are normal for age.
However, studies that have included explicit information about the age-related prevalence of common findings in asymptomatic populations have reported conflicting results.While early studies identified promising reductions in referrals and repeat imaging [7] and reduced opioid prescription [8] a large multi-centre randomised trial involving 250,401 participants found that a small shift in prescribing only [27][28][29].Changing the language to be less threatening also shows promise [6,17] as does inclusion of explicit evidence-based management advice [30].

Conclusion
Lumbar spine imaging reports frequently include findings that are unlikely to be clinically important without explicitly indicating they are benign.A wide variety of descriptors and uncertainty terms are used to put the findings in context that may be indirectly intended to convey the lack of clinical relevance of the findings.Clearer, more explicit language may reduce misconceptions about the relevance of lumbar spine imaging findings in people with low back pain and improve quality of care and health-related outcomes.

Table 1 . Demographics (sex and age) of patients that had imaging requested, number of reporting radiologists and imaging requestors, and word count and num- ber of imaging findings reported, by imaging modality.
*One osteopath, one neurosurgeon and one oncologist requested one X-ray, and one urologist and one rehabilitation physician requested a single CT scan.^Terms of uncertainty could include vague modifiers and hedging vocabulary https://doi.org/10.1371/journal.pone.0297911.t001