Recommendations for a Core Outcome Set for Measuring Standing Balance in Adult Populations: A Consensus-Based Approach

Background Standing balance is imperative for mobility and avoiding falls. Use of an excessive number of standing balance measures has limited the synthesis of balance intervention data and hampered consistent clinical practice. Objective To develop recommendations for a core outcome set (COS) of standing balance measures for research and practice among adults. Methodology A combination of scoping reviews, literature appraisal, anonymous voting and face-to-face meetings with fourteen invited experts from a range of disciplines with international recognition in balance measurement and falls prevention. Consensus was sought over three rounds using pre-established criteria. Data sources The scoping review identified 56 existing standing balance measures validated in adult populations with evidence of use in the past five years, and these were considered for inclusion in the COS. Results Fifteen measures were excluded after the first round of scoring and a further 36 after round two. Five measures were considered in round three. Two measures reached consensus for recommendation, and the expert panel recommended that at a minimum, either the Berg Balance Scale or Mini Balance Evaluation Systems Test be used when measuring standing balance in adult populations. Limitations Inclusion of two measures in the COS may increase the feasibility of potential uptake, but poses challenges for data synthesis. Adoption of the standing balance COS does not constitute a comprehensive balance assessment for any population, and users should include additional validated measures as appropriate. Conclusions The absence of a gold standard for measuring standing balance has contributed to the proliferation of outcome measures. These recommendations represent an important first step towards greater standardization in the assessment and measurement of this critical skill and will inform clinical research and practice internationally.


Objective
To develop recommendations for a core outcome set (COS) of standing balance measures for research and practice among adults.

Methodology
A combination of scoping reviews, literature appraisal, anonymous voting and face-to-face meetings with fourteen invited experts from a range of disciplines with international recognition in balance measurement and falls prevention. Consensus was sought over three rounds using pre-established criteria.

Data sources
The scoping review identified 56 existing standing balance measures validated in adult populations with evidence of use in the past five years, and these were considered for inclusion in the COS.

Introduction
Standing balance, defined as the ability to keep the center of mass within the base of support [1], is a prerequisite for many functional activities such as mobility and fall avoidance [2,3]. Balance impairment is common across multiple populations and leads to the greatest losses in years of healthy life and quality of life in people living with stroke [4], brain injury [5], arthritis [6], and up to 75% of people of advancing age (70 years) [7]. Exercise is postulated to improve balance and is associated with increased mobility and reduced falls in many of these populations [8][9][10][11]. However, synthesizing evidence on the effects of interventions for improving balance has been hampered by the extensive variation in the use of balance outcome measures among studies [2,12]. For example, a systematic review on the effectiveness of exercise interventions to improve balance in older adults identified 95 eligible studies [2] but was able to pool less than 50% of included studies because over 25 different measures were used to assess balance. Varied use of balance measures is also seen in clinical practice, as illustrated in one survey of balance assessment practices among Canadian physiotherapists that reported use of over 20 different measures [13].
Such inconsistency in use of balance measures reflects the absence of a gold standard method for evaluating standing balance [14] and subsequent prolific development of measures [15]. This plethora highlights the complex multifactorial nature of balance; measures vary in purpose, specific components of balance evaluated, measurement techniques, target population and extent of psychometric evaluation. However, given the importance of standing balance in fall prevention and mobility enhancement, there is a need for greater consistency in standing balance measurement across studies and for individual assessments [16]. One approach to achieve a more standardized practice is to identify and recommend a core outcome set for measuring standing balance. A core outcome set (COS) is defined as a recommended minimum set of outcomes or outcome measures for a particular health construct, condition, or population, the results of which should be reported for all trials pertaining to that issue [17]. In all cases, COS recommendations do not imply that measurement of the construct should be restricted to the COS; rather, the purpose is to advocate that the COS forms a consistent component of measurement and it is expected that additional measures may also be used.
The objective of this project was to propose recommendations for a COS of standing balance measures for research and practice settings in adult populations. Although core outcome sets were originally developed for clinical trials, including health care practice in the scope of a COS offers the opportunity to expand the utility of recommendations and potential for broad uptake. Recommendation of a few representative and feasible measures that can be widely used across a range of populations and settings can facilitate evaluating the efficacy of interventions to improve standing balance, and thus a recommended COS for standing balance will directly and substantially inform clinical research and practice internationally. In turn, this will optimize the development and implementation of evidence-based exercise programs for mobility enhancement and fall prevention worldwide.

Design
We used a consensus-based approach incorporating a modified Nominal Group Technique based on the RAND/UCLA Appropriateness Method [18], involving a combination of anonymous rating and face-to-face group discussion [19]. These techniques have been used to develop COSs for other health outcome measures [20,21], and published guidelines for reporting the development of COSs [17] were followed. The project was funded by a Canadian Institutes of Health Research (CIHR) planning grant (# MAG133935), and was registered on the COMET (Core Outcome Measures in Effectiveness Trials) Initiative database (available at http://www.comet-initiative.org/studies/details/244?result = true). Given the secondary nature of the data extraction, analysis, and recommendations, and as is common practice in COS development work, research ethics approval was not sought.

Expert panel sampling and recruitment
A purposive and iterative approach was used to identify individuals to sit on an international panel of experts for the consensus process. "Experts" were operationally defined as individuals who have national or international recognition in the fields of balance, mobility, exercise or fall prevention, and who regularly evaluate balance in their work. Within this context, individuals were strategically identified to represent a range of 1) related expertise (postural control, fall prevention, geriatrics, neurology, orthopedics, health service delivery, knowledge translation); 2) professional backgrounds (bioengineering, epidemiology, kinesiology, medicine, nursing, physiotherapy); and 3) practice settings (primary care, rehabilitation, nursing homes, homecare, community). The four members of the research team who initiated the project (KMS, SBJ, BEM, SES) have established track records in postural control, fall prevention, geriatrics, and hip fracture. They worked together to identify potential panel members who collectively represented all of the target expertise, professional backgrounds and practice settings identified as relevant to balance measurement. An initial cohort of individuals identified by the research team were contacted through email by the principal investigator (KMS), informed about the project, and invited to participate. Those who declined where asked to recommend other appropriate individuals, and any suggestions were discussed by the research team prior to invitation. Individuals were not excluded if they were the developer of one of the measures under consideration, but all panel members declared at the meeting whether they had any conflicts of interest related to participating in balance COS recommendations (including authorship) of measures under consideration for the balance COS. A panel size between twelve and eighteen individuals was sought, which falls within recommended ranges for consensus panels to provide good validity without excessively affecting group processes [22]. Consent was implied when individuals agreed to join the expert panel.

Identification of measures for consideration
A scoping review identifying published standing balance measures for adult populations [23] formed the pool of measures to be considered for the COS recommendations. Full details of the review are available. In brief, electronic searches of Medline, Embase, and CINAHL databases up to March 2014 were conducted using key word combinations of postural balance/ equilibrium, psychometrics/ reproducibility of results/ predictive value of tests/ validation studies, instrument construction/ instrument validation, geriatric assessment/ disability evaluation, as well as grey literature [24] and hand searches. Inclusion criteria were measures with a stated objective to assess balance, adult populations (aged 18 years and over), at least one psychometric evaluation, one standing balance task, a standardized protocol and evaluation criteria, and published in English. Two research assistants independently identified studies for inclusion and extracted characteristics (levels of measurement, scoring properties etc.), and psychometric properties for each measure. Two reviewers independently coded components of balance evaluated in each measure using the Systems Framework for Postural Control [25], a widely recognized model of balance. To avoid considering obsolete measures, electronic searches of Pubmed and Google Scholar were conducted on all identified measures published prior to 2009, and those with no references in peer-reviewed publications since 2009 or reported in a 2011 Canadian survey of balance assessment practices [13] were excluded.

Consensus process
The consensus process is summarized in Fig. 1.
Round One. Round one scoring took place online. To inform their scores, members of the expert panel were provided with background information, including: (i) the original publication and test items for each measure; (ii) a description of the established psychometric properties for each measure, downloaded from the Rehabilitation Measures Database, a searchable website containing evidence-based summaries of more than 250 rehabilitation measures (www.rehabmeasures.org), or a psychometric summary prepared by KMS if one was not available; (iii) results of the scoping review findings including measure characteristics and components of balance evaluated in each measure [23]; and (iv) a publication of balance assessment practices among Canadian physiotherapists [13].
Each measure was scored on a 5-point Likert scale (1-lowest, 5-highest) on three dimensions: (i) psychometric properties (validity, reliability etc.); (ii) feasibility of use on a large scale (practicality of administration, time, cost, equipment needs); and (iii) overall impression as a potential balance COS measure for adult populations. To manage workload, each measure was scored by half of the panel members, and to reduce bias, each participant had a different, randomly assigned set of measures to score. Panel members were invited to propose additional measures they felt warranted consideration.
Measures that received scores 4/ 5 on both psychometric and feasibility dimensions by 70% of scorers in round one were retained in the pool of potential COS measures and forwarded for discussion in round three. Measures that received scores 2/ 5 on the psychometric properties dimension by 70% of scorers were excluded. The remaining measures that received a range of scores across both dimensions were held for discussion in round two. Round Two. Subsequent rounds took place at face-to-face meetings held in Toronto, Canada on May 29 th and 30 th , 2014. One week prior to the meeting, panel members received a report of the round one results, including detailed reports of the scoring distribution and comments for each measure (S1 File). The proceedings were led by a professional facilitator with a background in physiotherapy, and were audio recorded and transcribed verbatim along with detailed notes taken by a recorder. One panel member (TH) published meeting status updates throughout the proceedings via Twitter, which are archived and available online (https:// storify.com/MSK_Elf/recommending-a-core-outcome-set-for-standing-balan). In round two, measures that received a range of scores across both dimensions were discussed by the expert panel, and then each member scored each of those measures on a single 5-point Likert scale rating the overall suitability for inclusion in the balance COS. A discussion of the constructs important for overall suitability of a balance COS was undertaken using the OMERACT filter (Outcome Measures in Rheumatology) filter framework to guide the discussion ( Table 1). The OMERACT filter is a framework of constructs developed for rheumatology core outcome sets that emphasizes the concepts of "truth", "discrimination", and "feasibility" [26]. Following the discussion, panel member scored each measure electronically using a web-based tool, and were blind to each other's scores. At this phase, measures receiving scores 4/5 on overall suitability by 70% of panel members were retained in the pool of potential balance COS measures and discussed in round three.
Round Three. In round three, panel members discussed the measures forwarded from rounds one and two. They also discussed and agreed that any panel members who developed measures under consideration in round three would abstain from the discussion and final vote. In round three, panel members responded to the following yes/ no statement for each measure: "This measure should be included in a COS of balance measures for adult populations". Measures required support by a minimum of 70% of the panel members to be included in the final balance COS recommendations.

Expert panel membership
Twenty individuals were invited to join the expert panel in the consensus exercise. Two declined the invitation, and four who accepted the invitation withdrew prior to the beginning of consensus activities due to scheduling conflicts. Fourteen individuals (70% of those invited) joined the expert panel-13 in person and one via teleconference (KV). One co-investigator participated in discussions via teleconference but did not vote (BEM). Expert panel Table 1. OMERACT (Outcome Measures in Rheumatology) filter to determine applicability of a measurement instrument in a setting.

Construct Explanation
Truth Is the measure truthful, does it measure what is intended? Is the result unbiased and relevant? The word "truth" captures issues for face, content, and construct validity (As gold standards are often not available, criterion validity is mostly not tested) Discrimination Does the measure discriminate between situations of interest? The situations can be states at one time (for classification or prognosis) or states at different times (to measure change). The word "discrimination" captures issues of reliability and sensitivity to change Feasibility Can the measure be applied easily, given constraints of time, money, and interpretability? The word "feasibility" captures an essential element in the selection of measure, one that may be decisive in determining a measure's success doi:10.1371/journal.pone.0120568.t001

COS development
The results are summarized in Fig. 2. The scoping review identified 66 measures. Of these, ten measures published at least five years earlier with no evidence of use since then were excluded. Fifty-six measures were considered in the pool of potential balance COS measures (Table 3) and scored in round one. Following round one, 15 measures were excluded, two measures were forwarded to round three (Berg Balance Scale (BBS) [27] and Timed Up-and-Go (TUG) Test [28]), and 39 measures were held for discussion in round two. At the meeting, initial discussions focused on the parameters of the COS and feasibility of making one recommendation applicable to research and practice in all adult populations. The advantages and disadvantages of both broad and narrow-scoped recommendations were debated, and the decision was made to maintain the objective to recommend a COS for measuring standing balance in research and practice in adult populations. Subsequent discussions addressed the constructs necessary for a COS for standing balance. There was general agreement regarding the application of the OMERACT framework principles within the consideration of "overall suitability", and the need to consider the many components that comprise the balance "system" [1].
Once these parameters were defined, the group considered the 39 measures held for discussion in round two. While these measures did not meet either of the a priori criteria for  [29]) reached the 70% threshold for forwarding to round three. However, to promote discussion the group agreed to forward two additional measures that achieved sufficient scores by at least 50% of panel members (the Short Physical Performance Battery [30] and Unified Balance Scale [31]). As such, three measures were forwarded to round three and fifteen measures were excluded. Five measures were discussed in round three: two that were forwarded directly from round one, and three that were forwarded from round two. The five measures considered in round three were: BBS, Mini BESTest, Short Physical Performance Battery, TUG Test, and Unified Balance Scale. Two panel members were developers of two of the measures under consideration, and abstained from the discussion and vote. As such, twelve panel members participated and voted in round three. In the round three discussion, comments centered on whether a single measure could achieve all the objectives of the standing balance COS in research and practice in adult populations. Comments suggested that the group thought that while a single   measure would be ideal from a minimum dataset perspective, one measure could not address the full spectrum of abilities among the adult population, and that a small number of measures -less than three-would be a permissible compromise. Of the five measures considered in round three, two achieved consensus on being included in COS recommendations for measuring standing balance in research and practice in adult populations: Berg Balance Scale [27] and Mini Balance Evaluation Systems Test [29] (S3 File).

Discussion
The need for increased consistency and psychometric rigor in the evaluation of standing balance in adult populations in order to advance understanding and implementation of optimal interventions to improve mobility and decrease falls is well-recognized. The expert panel convened in the current project recommends that at a minimum, either the Berg Balance Scale or the Mini Balance Evaluation Systems Test should be used when measuring standing balance for research and practice in adult populations. Both the face-to-face panel meeting and anonymous scoring were integral to the development of the recommendations. The interactive discussions allowed for debate and reflection, while anonymous voting allowed individual panel members to make a full and equal contribution to the recommendations even if they did not share the opinion of the majority. This novel project represents the first attempt to make COS recommendations for the field of balance research and practice, and as such should be both viewed as a starting point and revisited in the future.

Two recommended measures in the standing balance COS
Two measures gained consensus for recommendation by the panel. Characteristics of both measures are presented in Table 4, and readers are encouraged to consult the Rehabilitation Measures Database for a more detailed description of the psychometric properties of each measure. Comparisons of the two measures have noted that they are highly correlated (correlation coefficients ranging from 0.79-0.94 [32][33][34][35][36], and in one study directly comparing the psychometric properties of the Mini BESTest and BBS, both measures performed similarly on the majority of characteristics [32]. The BBS was recommended because it is both well-validated in a number of adult populations and widely used in both research and practice settings. It was published in 1989, with the objective to develop a valid measure of balance that was appropriate for geriatric patients (aged 60 years and older) and for use in a clinical setting [27]. It has been widely evaluated subsequent to its initial development, and tested in a number of populations. It is commonly used in physiotherapy practice [13] and has been translated into several languages. These factors contribute to the suitability of the BBS for standing balance COS recommendations and potential for broad implementation. A limitation of the BBS is that ceiling effects have been well-documented in higher functioning individuals [43,56,65,69], restricting its suitability for all adult populations. Moreover, while it includes some components of balance, including underlying motor systems, static and dynamic stability, functional stability limits, anticipatory postural control and sensory integration, it does not evaluate verticality, reactive postural control, or cognitive influences on balance [23], which are all important for avoiding falls.  [37,38], multiple sclerosis [39], osteoarthritis [40], Parkinson's Disease [41,42], spinal cord injury [43,44], stroke [45][46][47][48], brain injury [49], vestibular dysfunction [50] People with neurological impairments [33,51], people with age-related balance disorders [32], community-dwelling older adults [52] Psychometric properties evaluated  [33,34,66] Reported construct validity ranges Convergent with the Barthel index r = 0.87-0.94 [47,67] Discriminates between stroke vs. healthy [33], faller vs. non-faller [33,51], balance deficits vs. not [35] Reported responsiveness ranges Effect size = 0.26-1.11 [47,65,68], area under ROC curve = 0.91 [32] Area under ROC curve = 0.92 [32] Component of balance evaluated (23) Underlying motor systems, static stability, dynamic stability, functional stability limits, anticipatory postural control, sensory integration The second recommended measure in the standing balance COS, the Mini BESTest, addresses some of the limitations in the BBS. The Mini BESTest was published in 2010. It was developed as a shorter version of a more comprehensive test [70], using factor and Rasch analyses [29]. Documented ceiling effects were less than the BBS in a sample of inpatients (mean age 66 years) with balance disorders [32], however one study noted a minor ceiling effect in very high functioning neurological patients [29]. It evaluates most components of postural control: underlying motor systems; verticality; static and dynamic stability; anticipatory and reactive postural control; integration of sensory information; and cognitive influences on balance; but not functional stability limits [23].
However, as with the BBS, there are also limitations to the Mini BESTest in the context of standing balance COS recommendations. It has been evaluated considerably less than the BBS, likely related to its more recent emergence in the literature. Responsiveness has been demonstrated in prospective descriptive studies [32], but the Mini BESTest has yet to be published in a clinical trial. Moreover, there is no published evidence of its uptake in clinical or community practice. Panel members acknowledged that the Mini BESTest requires more population testing, and its applicability across care settings and functional abilities needs to be demonstrated.
These two measures received the votes required for recommendation because they collectively best represent the objectives of the standing balance COS. They have unique features that make them suitable for COS recommendations and it is recommended that users choose at least one of these measures based on their particular needs. In considering which of the two measures to use in research or practice, readers may wish to consider a number of factors highlighted by the panel. The BBS may be considered more suitable for lower functioning adults, while preliminary data suggests the Mini BESTest may cover the continuum of balance abilities. If ability to perform the test is not an issue, the Mini BESTest evaluates more components of postural control than the BBS, and may be considered a more comprehensive measure.

Measures not included in the balance COS recommendations for adult populations
The very definition of a core outcome set restricts the number of measures that can be recommended. Many well-developed balance measures were excluded from the current COS recommendations because they were too narrow in scope of target population or feasibility on a broad scale. Readers are cautioned not to infer that the current recommendations constitute best practice recommendations for balance assessment, but instead are recommended as a minimum standard for standing balance measurement. In fact, adoption of the COS measurement should not be construed as a comprehensive assessment of balance, and the panel recommends that additional population-specific measures be used, particularly when designing balance training programs.
Of the 56 measures considered for the balance COS, five reached the final round of discussion. While only two of these measures were included in the final recommendations, the three excluded measures each warrant a specific comment. The TUG test received high scores in round one and was forwarded directly to round three. In those discussions; the panel recognized its psychometric properties, feasibility and widespread clinical utility, but questioned in regards to variability in methods of application and as to whether it genuinely reflected the construct of 'balance'. Moreover, the TUG test is also included in the Mini BESTest which was recommended in the balance COS. The Unified Balance Scale and Short Physical Performance Battery were both included in the round three discussions as a result of slightly relaxed criteria modified during the meeting, but neither achieved consensus on recommendation for the final COS. As such, the relaxed criteria did not unduly influence the outcome of the recommendations. The Unified Balance Scale, a recent scale combining items from the Balance Evaluation Systems Test [70], Fullerton Advanced Balance Scale [71], and Performance Oriented Mobility Assessment [72] also received high scores from the panel in rounds one and two, and discussions noted its comprehensive nature and appropriateness for a wide range of physical abilities [bed to community]. Its potential for future COS recommendation was noted, but the panel recognized it is not currently appropriate due to insufficient psychometric evaluation and high number of test items [27]. Finally, the Short Physical Performance Battery was discussed in round three and its psychometric properties and utility for large clinical trials was recognized. Although it did not reach consensus for inclusion in the standing balance COS recommendations, its use as a quick measure of lower extremity function that includes a standing balance item and appropriateness for large cohort and intervention studies where balance and mobility were not primary outcome measures was recognized.

Limitations
The current standing balance COS recommendations are not without limitations, and should be interpreted in this context. First, it is acknowledged that the consensus process is not a completely objective exercise. The panel members, while invited with the goal of being representative, may not share the opinions of all potential users of the standing balance COS recommendations. While attempting to account for practice-related issues, the panel's expertise was skewed towards research-related issues. Although attempts were made to control for conflicts of interest (such as developers of measures in contention in round three not participating in the final vote), there is no guarantee that they were eliminated. Second, the broad aims of the current standing balance COS objectives are both a strength and a weakness. There may still be some questions about applicability in some populations and/ or settings. Future iterations of balance COS recommendations may elect to narrow the scope of populations and settings included in the review, but would risk losing the ability to make meaningful comparisons across groups. Third, no single measure met all the intended objectives for the COS recommendations. As such, variation in reporting is still going to be an issue and may impact the ability to synthesize balance intervention data. The panel acknowledged this limitation, but felt the tradeoff of recommending a single balance measure was impractical and would limit successful uptake. Another consequence of the decision to recommend two measures in the standing balance COS is that the decision on what measure to use becomes more complicated and requires some discretion. Fifth, there will be a number of challenges for implementation of the recommendations. It is acknowledged that both measures require both a significant investment of time, as well as some training and equipment, which have implications for implementation. If users are not currently using one of the recommended measures, adoption of the COS recommendations will require changing their behavior, which also has implications for implementation. In particular, the Mini BESTest is less widely known, which may skew uptake towards the more familiar BBS.

Conclusions
The lack of a gold standard measure and subsequent disparate quantity and nature of existing approaches for the measurement of standing balance are an important factor limiting the ability to advance the optimization of exercise interventions for fall prevention and mobility enhancement, and may be related to clinicians' frustrations with outcome measures [73] and challenges prescribing exercise programs [74]. These COS recommendations for evaluating standing balance reflect an attempt to find 'common ground' that can meet the needs of a broad range of users. Our recommended COS for standing balance will directly and substantially inform clinical research and practice internationally. However, continued efforts to promote uptake and implementation of the COS will be required to maximize its utility.
Supporting Information S1 File. Round One Results. This file contains the round one ranking results by measure, organized by decision. It contains basic information about each measure, the scoring distribution for the psychometric and feasibility categories, and comments noted by the panel.