Does the fit of personal protective equipment affect functional performance? A systematic review across occupational domains

Objective To explore the effect of personal protective equipment (PPE) fit on functional performance across a range of occupational domains. Background PPE introduces an ergonomic, human systems integration, and mass burden to the wearer, and these factors are thought to be amplified if PPE is ill-fitting. However, few studies have considered the role of fit (static, dynamic, and cognitive) when evaluating PPE-related performance detriments in occupational settings. Method A systematic literature review was conducted to identify relevant studies, which were then critically appraised based on methodological quality and collated to compare key findings and present evidence-based recommendations for future research directions across a range of occupational domains. Results 16 published studies met the inclusion criteria, 88% of which found that the fit of PPE had a statistically significant effect on occupational performance. Poorly sized PPE resulted in slower or increased reaction time; decreased range of motion or mobility; decreased endurance or tolerance; decreased pulmonary function; and altered muscle activation. Limited research met the inclusion criteria and those that did had risks of bias in methodology quality. Conclusion Future research evaluating the effect of PPE on performance in occupational settings should aim to recruit a more representative population; consider sex as a covariate; quantify and evaluate PPE fit and performance when integrated with all relevant equipment items; include outcome measures related to all three categories of fit (static, dynamic, cognitive); and assess performance of operationally relevant tasks.


Background
PPE introduces an ergonomic, human systems integration, and mass burden to the wearer, and these factors are thought to be amplified if PPE is ill-fitting. However, few studies have considered the role of fit (static, dynamic, and cognitive) when evaluating PPE-related performance detriments in occupational settings.

Method
A systematic literature review was conducted to identify relevant studies, which were then critically appraised based on methodological quality and collated to compare key findings and present evidence-based recommendations for future research directions across a range of occupational domains. 16 published studies met the inclusion criteria, 88% of which found that the fit of PPE had a statistically significant effect on occupational performance. Poorly sized PPE resulted in slower or increased reaction time; decreased range of motion or mobility; decreased endurance or tolerance; decreased pulmonary function; and altered muscle activation. Limited research met the inclusion criteria and those that did had risks of bias in methodology quality.

Conclusion
Future research evaluating the effect of PPE on performance in occupational settings should aim to recruit a more representative population; consider sex as a covariate; quantify

Introduction
Personal Protective Equipment (PPE) describes any item or article worn to minimise risk to the wearer's health and safety from work-related physical dangers, which might include allergens, ballistic threats, chemicals, electricity, heat, impact, radiological exposure, or sharps [1]. PPE is therefore commonly used in occupational settings that involve exposure to these dangers, such as the military, protective services (e.g. police, firefighting, security), skilled trades (e.g. electrical, plumbing, carpentry), healthcare, and aerospace/aviation. Depending on the particular risks, PPE might include anything from gloves worn by industrial workers in an assembly line (designed to protect the hands from injury and minimise discomfort, particularly when working with hand tools) [2][3][4], to body armour for military populations (which functions to protect the wearer's essential organs from ballistic, fragmentation, and stab threats) [5,6], or space suits (which protect astronauts from the extreme temperatures in space, radiation, and space dust, as well as provide oxygen for astronauts to breathe) [7]. Although protective equipment is the lowest level on the hierarchy of controls for occupational hazards [8], PPE is essential in many occupational domains and required to comply with safety and protection standards [9]. However, PPE often introduces an ergonomic, human systems integration, and mass burden to the wearer [10]. Use of body armour systems and firefighter turnout gear, for example, have been quantitatively shown to negatively impact ROM and dynamic task performance [11][12][13][14][15]. Working in hot and humid conditions while wearing PPE has also been shown to place additional physiological stress on the body that impacts cognition and comfort, ultimately leading to fatigue, decreased performance, and injury [9,16,17]. Strategies are therefore needed to minimise detriments associated with requisite PPE use.
Fit is thought to be key factor affecting functional and operational performance detriments associated with PPE use [18]. Correctly sized PPE has been consistently shown to minimise ROM loss [14,19], interference [20], physiological stress, and fatigue [21] associated with the use of protective equipment. Conversely, improperly sized PPE increases the likelihood of overexertion, fatigue, discomfort, and injury [22,23]. For example, research has demonstrated that undersized body armour may compromise protection to essential organs [6], while oversized body armour, although likely to increase protection, has been shown to negatively impact mobility and operational task performance [5,14,20]. However, optimal sizing and associated fit is often difficult to achieve. In the case of female soldiers, for instance, several recent publications have reported that the existing unisex sizing range does not accommodate the variety of female breast and torso shapes among the population and therefore that women in this field are more likely to be affected by fit-related PPE detriments [13,20,23]. Studies of firefighting uniforms have reached similar conclusions, reporting that oversized pants and jackets have a significant negative effect on range of motion and performance compared to pants and jackets that are correctly sized [11,24]. Interestingly, female participants in one particular firefighting study reported that they intentionally ordered pants larger than their recommended uniform size to accommodate their proportionally larger hips, which had the unfortunate consequence of limiting their range of motion and impairing task performance [12]. These examples of PPE sizing affecting occupational task performance, particularly for women, are not isolated and have been broadly reported across domains from aerospace to manufacturing to healthcare [25][26][27][28][29][30][31][32]. Sizing and associated fit should therefore be an essential consideration when evaluating the effect of PPE on performance.
The concept of "fit" represents an optimized status between the wearer and their immediate environment [33]. Beyond this, however, fit has been poorly defined in the literature in relation to protective equipment. Although specific to functional wearable equipment such as exoskeletons, a recent review by Stirling et al. proposes the most complete definition of fit and its various classifications [18]. Specifically, these include static fit, or the alignment between the wearer's anthropometry (three-dimensional size and shape) and the PPE system; dynamic fit, or how the wearer and the system interact during functional ROM and task performance; and cognitive fit, or how the wearer's cognitive and decision-making capabilities are impacted by wearing the system [18]. Each of these fit characteristics are influenced by the others in a complex interaction; however, the development of PPE sizing has focused, perhaps disproportionately, on static fit and how a limited number of standard anthropometric characteristics of soldiers, police officers, firefighters, or healthcare workers align with dimensions of the PPE system [5, 14, 19, 20, 25, 27-30, 32, 34].
Typically, "fit" has been linked to the existing sizing based on a two-step process, whereby (1) anthropometric dimensions (e.g. torso length) are used to determine a recommended size and (2) subsequent subjective assessments (e.g. participant comfort) result in shifting up or down in sizing to achieve the best fit based on the range of sizes available. This method informs which size was selected by the process, as well as provides insight into subjective issues that resulted in deviations from the recommended size, but ultimately fails to identify specific dimensions associated with ill fit due to the large number of confounding variables. Indeed, as multiple dimensions associated with the PPE are changed simultaneously based on the original sizing concept, it is difficult to discern the source of ill fit and therefore recommendations for future PPE design are limited [35,36]. While one-dimensional anthropometric measures coupled with user perceptions can support initial sizing, there is opportunity to adapt the nominal two-step process by expanding initial sizing and PPE evaluation to include a comprehensive assessment of fit inclusive of three-dimensional static fit, human movement in the system (dynamic fit), and the potential mental impact of system use (cognitive fit). Through assessment of static, dynamic, and cognitive fit in combination, organisations can better understand and alleviate functional performance detriments related to PPE system use within their respective occupational settings.
Despite the breadth of previous research that has quantified the impact of PPE on human performance (i.e. compared to a baseline condition without any PPE), relatively few published studies have considered the effect of PPE fit in performance evaluations (i.e. comparing fit conditions) and there remain many gaps in the current knowledge [11,24,[37][38][39][40][41][42]. Therefore, the aims of this systematic review were to (i) characterise methods of investigating the effect of PPE fit on performance; (ii) synthesise previous findings of PPE fit on a range of performance measures across occupational domains; and (iii) identify research problems regarding PPE fit and occupational performance to recommend future research directions.

Literature search
The following electronic scientific databases were searched from study inception to May 2022: MEDLINE, SCOPUS, PUBMED, CINAHL, Science Direct, Web of Science. Search terms were piloted and reviewed across each database; the specific advanced search terms with relevant Boolean operators are shown in Table 1. As government-funded military and aerospace research is commonly presented at conferences, a hand search of the International Conference on Environmental Systems congress proceeding databases was also conducted to identify any additional literature.

Inclusion and exclusion criteria
This systematic review was conducted in accordance with the guidelines from the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) [43]. The flow chart shown in Fig 1 illustrates the results of the literature search, screening, and selection process of studies for inclusion in the present review. Specifically, all identified references were imported into Covidence Systematic Review Software (Melbourne, Australia) and duplicates were removed. Studies identified by the initial search (n = 3111) were then assessed based on title and abstract by two authors (CC and CR) separately and independently in Covidence, and any discrepancies between the two reviewers were discussed until consensus was reached. Studies were only eligible for inclusion in this review if they: (i) incorporated at least one measure of fit (static, dynamic, or cognitive) of the PPE item; (ii) investigated the effects of PPE on at least one measure of human performance (including physical or cognitive assessments); (iii) considered a specific occupational domain; (iv) were published in English; and (v) were published after 1970, as relevant legislation mandating the use of PPE was passed in 1974 (Health and Safety Work Act 1974, Australia). Based on a review of the title and abstracts against this inclusion criteria, 86 studies were assessed as full-text by two of the authors (CC and CR) to determine relevance to the present review. At this stage. studies were excluded if they: (i) were not peer reviewed; (ii) did not recruit a human population; (iii) did not include an appropriate human performance measure (note: although this was specified as inclusion criteria for fulltext screening, it was not always apparent from the title and abstract); (iv) did not include fit as an independent variable; or (v) did not present complete methods and results. Based on these exclusion criteria, sixteen studies were included in the review.

Data extraction
Data were extracted from the 16 included studies into an Excel database (Microsoft, USA) by two authors (BB and CR) and collated to provide a systematic overview of main findings, establish the strength of available evidence, and identify gaps in current knowledge. Key data extracted included participant characteristics (e.g. sample size, sex, age, occupational domain, "Personal protective equipment" OR "protective equipment" OR PPE OR equipment OR "protective clothing" OR "body borne" OR body-borne OR suits OR gloves OR "tactical armour" OR "space suit" OR spacesuit OR exoskeleton OR "individual protective equipment" OR "tactical vest" OR exosuit OR exosystem OR gear OR "safety vest" OR "safety equipment" OR turnout OR kit OR "fall vest" OR harness OR armor OR armour OR "equipment configurations" OR boots OR "body armor" OR "body armour") AND (Military OR "emergency services" OR security OR defence OR defense OR police OR "law enforcement" OR medical OR firefighter OR officers OR aerospace OR space OR "emergency response" OR soldiers OR paramedics OR "fire service" OR astronaut OR cosmonaut) AND ("physical performance" OR performance OR task OR movement OR "range of motion" OR "physical fitness" OR balance OR "joint angles" OR "operational task performance" OR cognitive OR strength OR "joint displacement" OR "range of movement" OR flexibility OR "task performance" OR ROM OR pressure OR "contact pressure" OR attention OR "situation awareness" OR memory OR "situational awareness") country), equipment characteristics (e.g. type of equipment used, user experience, sizing, designs), study characteristics (e.g. design, setting, test conditions, fit assessment, performance measures), and a summary of findings (e.g. effect of PPE fit on performance).

Risk of bias analysis
Eligible studies identified through the literature search and screening processes were critically appraised to assess methodological quality using the Mixed Methods Appraisal Tool [44] The Mixed Methods Appraisal Tool includes specific criteria for qualitative, quantitative and mixed methods studies, focusing on methodological quality (Table 3). Each study included in the quality appraisal was evaluated by two reviewers (CC and CR). Every study was assigned a score (0-2) based on each question within the appraisal tool, with a rating of 2 adopted to indicate a low risk of bias, a rating of 1 indicating an unclear risk of bias, and a rating of 0 indicating a high risk of bias. These ratings were documented and included in the results.

Sample size.
A total of 527 participants were involved across the 16 studies, with an average of 33 participants per study. The sample size ranged broadly between studies from 3 participants [26] to 150 participants [19]), and more than 81% (n = 13) [5, 12, 21, 25-32, 34, 45] recruited fewer than 30 participants. Small sample sizes limit the generalisability of the current literature within the specific occupational domain, as they are unlikely to have adequate statistical power to account for human variability (e.g., anthropometry, posture, experience, and self-selected approach to task performance). Therefore, it is recommended that future research recruit larger, more representative samples to ensure that results accurately characterise the user population.

Sample population.
A majority of studies within the present review recruited participants that were specific to the occupational domain (n = 12; 75%) as opposed to general population (university students [28][29][30] or adults who met general astronaut eligibility criteria [26]; n = 4; 25%). Considering that workers within a given occupational domain are likely to have characteristics and skills unique to their profession, as well as experience using the equipment item, recruiting participants from the general population may confound results and limit generalisation, for example, to actual manufacturing and assembly workers [32], to healthcare workers [30], or to astronauts [26]. A further limitation of current studies is that few were actively sampled to represent the user population; that is, sampling a range of body size and shape dimensions representative of the user population and that would interact with and vary overall PPE fit, such as BMI, breast size, or hand dimensions. As such, many study samples fail to represent the diversity of the user population, thus limiting the generalisability of results. To

Participant age.
Of the studies that reported a mean age of all participants (n = 10), the average age was 27.7 years; an additional three studies reported separate mean ages for male and female participants [12,27,31], two studies only reported the age range [30,45], and one neglected to report any data on participant age. [19] Ensuring that participant age is representative of the user population is important given the association between age and human performance [55][56][57]. For example, evaluating the dexterity of university students (aged 21-30 years) while wearing gloves will have limited application for healthcare workers above that age range [30]. It is recommended that future research recruit participants from a broad agerange, reflective of the end user population to ensure generalisability of the results to the entire user population.

PLOS ONE
Fit of PPE and functional performance

Sex differences.
Half of the included studies recruited a mix of male and female participants (n = 8; 50%) [12, 25, 27-30, 34, 45], with fewer studies recruiting male-only participants (n = 5; 31%) [5,14,21,26,31], and the fewest number of studies recruiting female-only participants (n = 2; 13%) [19,20]. One study did not report the sex of participants, specifying only that participants were "sixteen asymptomatic seated workers with normal hands and no deformities, skin diseases, or latex allergies" [32]. Given that previous research has demonstrated a marked difference in the performance detriments that males and females experience when using PPE [58,59], as well as observing varied anthropometric characteristics [60] and physiological responses between the sexes [61,62], the generalisability of performance results within the profession are likely to be limited if studies do not consider sex as a covariate or if sex is not reported at all. Furthermore, existing research indicates that females across a range of occupations experience increased fit issues with PPE compared to their male counterparts [12,20], which often translates to amplified performance effects. Despite this finding, females remain underrepresented in literature pertaining to protective equipment. For example, male firefighters may experience no significant performance detriments when wearing properly sized turnout gear, while the functional ease and mobility of female firefighters are greatly compromised [12]; yet, access to female-specific firefighting PPE is low (42% of 840 female firefighters surveyed in the UK and Ireland, North America, Australasia and mainland Europe) [63]. Similar results in terms of sex-specific performance detriments have been reported across a range of occupations  [12,27,58,59]. Four studies (of the eight that recruited mixed sex participants) included only one [29] or two [28,30,45] female participants, which makes a valid statistical comparison between sexes impossible. Indeed, only two studies compared either performance effects or fitrelated performance effects between men and women [12,27], but despite substantial anecdotal evidence, the extent of sex or gender differences in many occupational domains remains undocumented. Greater evidence is required to promote inclusivity within historically male professions and remove barriers to participation for women and non-binary people in specialised occupations. Representation of non-binary individuals within the literature was not observed, and future research is recommended to be inclusive of all potential workers. Interestingly, all military studies in the present review investigated body armour [5,14,[19][20][21] and all uniforms were specific to firefighting [12], while gloves were widely assessed in a healthcare context (vinyl exam gloves [30,34] or latex surgical gloves) [25,30]; for manufacturing and assembly (commercially available work gloves) [32]; and within the aerospace industry (for integration with spacesuits) [28,29]. No studies investigating boots [64,65] or protective eyewear [66,67] met the inclusion criteria for the present study, primarily due to not assessing fit [65,[68][69][70][71][72][73][74][75][76][77][78][79][80][81][82][83][84][85] or not evaluating the effect of fit on at least one measure of performance (often studies instead measured pain or injury prevalence) [67,[86][87][88][89][90][91][92][93][94][95][96][97][98][99]. As such, there is a paucity of published literature on the extent to which these PPE items impact occupational performance, as well as the role of PPE fit. These data, however, are essential knowledge for organisations and employers procuring PPE for diverse user populations. Several additional studies met the bulk of criteria for inclusion, but the garment or item being investigated was ultimately not deemed PPE. This included studies investigating performance while participants wore a backpack or some form of load carriage system [100,101]. As PPE is often designed to be worn with other pieces of equipment, such as backpacks, rifles, and firehoses, it is important to consider the fit of PPE when worn or used with other relevant equipment items. For example, ensuring that a backpack and body armour system are integrated well enough to minimise further performance detriments is important for the dismounted soldier. Research has shown that poor sized PPE is associated with increased integration issues [20,102], which has also been linked to a decreased ability to complete occupation-specific tasks [20]. As the physiological burden of PPE differs based on the equipment with which it is being used [103], future studies should consider evaluating performance while participants wear all relevant equipment items, such as a backpack or load carriage system, in combination with PPE, such as body armour, firefighting uniforms, or helmets.

Fit conditions & sizing.
As a criterion for inclusion in the present review, all 16 studies made a functional performance-based comparison between at least two and as many as five different fit conditions of PPE. Typically, participants wore PPE that was determined to be the correct size (based on the assessment described in Section 3.4.2) and several other sizes that were deemed to be too small or too large [5, 14, 21, 25, 27-30, 32, 34]. Additional studies also compared performance while participants wore PPE that fit certain body dimensions and not others (e.g. PPE fits chest but not stature compared to stature but not chest) [19,31] or compared performance between participants whose current PPE was subjectively assessed as a good fit or a poor fit [12,20,26,45].
In a majority of studies, however, this methodology is inherently biased, in that it restricts fit conditions to the available PPE sizing systems, which may not effectively accommodate the user population [11,13,20,23,24]. For example, body armour is often issued in unisex sizes, which does not provide adequate allowance for differences in size, shape, or position of breast tissue, thus preventing many female soldiers from being able to achieve correct fit in any available size [13,20,23,104]. Similarly, females have smaller hands but are issued the same size and design of firefighting glove, which has been associated with reduced dexterity and occupational performance detriments [105,106]. Current methodologies do not account for limitations of the existing sizing range, which limits design guidance for future PPE. Future research should consider three-dimensional body size and shape dimensions and their multivariate interactions for PPE design.
3.4.2 Initial size assessment for fit conditions. Investigating the role of PPE fit on occupational performance necessitates a baseline fit condition, which is typically assumed to be "best fit" and compared against sizes that are larger and/or smaller. However, the determination of initial sizing is often biased, in that it relies upon highly subjective opinions and criteria. Several studies in the present review determined initial fit condition via participant selection (i.e. the participant chose their own best-fit PPE; n = 4) [20,25,30,34], which is limited by the lack of standardization and objectivity in assessment. Perceived fit (and therefore size selection) is greatly dependent upon personal preference (e.g. preference for a tighter fit) and individual comfort (e.g. more comfortable in larger size) [18], which can vary widely between participants within the same "fit" condition. Additional studies determined initial fit of PPE through anthropometry and sizing charts (i.e. researchers used measurements and/or a predetermined sizing chart to determine PPE size for each participant; n = 5) [19,[27][28][29]32], which limits potential fit to the existing sizing range. Studies also determined initial sizing by visual inspection (i.e. researchers or subject matter experts visually confirmed fit of the PPE for each participant based on prior experience or criterion; n = 4) [21,26,31,45] or a combination of anthropometry and visual inspection (n = 2) [5,14]. However, there is often limited standardisation in determining sizing between these subject matter experts due to a lack of objective, validated criteria. One included study asked participants to wear their own firefighting pants and quantified perceptions of fit by (a) subjective evaluations via a survey, (b) 3D body scanning, and (c) exit interviews [12]. Future studies are encouraged to employ some combination of these methods, whereby the initial size is determined by quantitative three-dimensional anthropometric shape measures and then confirmed or possibly modified by subject matter expert based on objective criteria as being the best fit of the available sizing range. The initial sizing step is crucial, as no valid comparison of fit will be possible if the initial sizing is not done in a standardised manner.

Static fit evaluation.
A minority of studies evaluated static fit, or how the wearer's anthropometry corresponds to the system's geometry in a standardised posture after the initial sizing selection (n = 3) [12,19,20]. One method for assessing static fit is measuring one to two standard anthropometric dimensions of the individual either via anthropometric tools such as tape measures, callipers, and anthropometers, or extracted from three-dimensional scans to provide information about the geometry of the individual's body [18]. These measurements are then compared to the dimensions of the system to make a determination of static fit. However, it is important to consider what relative sizing between the user population and PPE is appropriate, and to then use these data to define objective criteria for fit evaluations of each specific PPE item. Even when such criteria have been developed, consideration must be given to the relative weighting between criteria. For example, bra fit studies traditionally determine fit using established pass/fail criteria [107][108][109] and an overall fail may be associated with a relatively minor fit issue, but is indistinguishable from an overall fail associated with several major fit issues. Without specificity, the resultant fit data is insufficient to inform design changes or sizing alteration. Advancements in three-dimensional body scanning enable organisations to collect numerable anthropometric measures that can inform static fit with greater specificity. It is recommended to incorporate three-dimensional anthropometry and shape dimensions (in functionally-relevant postures and operational tasks) to quantify the interactions between the user and the PPE. Researchers also need to consider how they quantify and evaluate static fit after initial sizing has been performed. Validated, objective assessment criteria must be developed and utilised to improve the consistency of visual and/or virtual inspection. These data can also help characterise and disambiguate preferences and perception-based fit approaches.

Dynamic fit evaluation.
All studies included in the present review assessed dynamic fit (how the wearer and the system interact during functional ROM and task performance; n = 16) [5, 12, 14, 19-21, 25-32, 34, 45]. Dynamic fit is important to assess in the context of occupation specific tasks, as the aim of the equipment item should be to minimize restrictions on mobility, and associated fatigue, metabolic cost, performance, and injury detriments. Commonly, dynamic fit is assessed through the use of ROM and functional task performance, including occupationally relevant tasks [13,18,[110][111][112] with standardised ROM tasks and procedures having been developed for some equipment items, such as body armour [15]. These tasks typically compare encumbered and non-encumbered conditions to establish a baseline for performance measures. Importantly, dynamic fit assessment should quantify the three-dimensional interactions between the wearer and the system during a range of functional poses or occupation-specific tasks while wearing the equipment item. Furthermore, given the long durations in which workers are required to wear PPE [20,113], evaluation of performance during prolonged field-based exercises has been recommended for ecological validity and to better assess of the effects of fit of PPE on task performance.

Cognitive fit evaluation.
Approximately half of studies also included some assessment of cognitive fit, or how the wearer's integrated perception-cognition-action outcomes are impacted during system wear (n = 9) [5, 20, 25, 27-30, 32, 34]. Research has demonstrated the high relevance of cognitive tasks to occupational settings that require an underlying capacity for sustained attention (e.g. operational/tactical personnel, occupations that require intense concentration such as surgeons) [114,115]. In such occupations, the user's cognitive capabilities must be maintained such that operational or task performance is unfettered. Cognitive fit can be assessed indirectly through an occupational task that requires information processing, such as perception, attention, memory, or problem-solving, or assessed directly through a specific task designed to evaluate a particular cognitive construct. The indirect approach is more widely represented by the studies included in this review; 80% of the 10 publications that assessed cognitive fit did so through an occupational task that naturally involved the perception-cognition-action decision process rather than through a task that directly measures an individual construct. A subset of publications (n = 2) also included direct measures of constructs, such as perception thresholds [29,30]. The operational tasks ranged from marksmanship [5] to timed performance of a manual task (e.g. dexterity, tactility, accuracy) [25,[28][29][30]34]. Alternatively, studies might employ a cognitive task to directly assess an individual construct, such as how PPE influences attention, problem-solving, or motion inhibition. A range of cognitive tasks from the field of psychology have been adopted in other performance-based literature, including sporting, aging, and military settings to evaluate cognitive performance. Exemplary tasks that may be suitable for future studies include the psychomotor vigilance task (assesses vigilance and response speed) [114,[116][117][118], n-back task (assesses working memory) [119,120], Task-Switching (assesses cognitive flexibility) [121], and the sustained attention to response task (assesses sustained attention and response inhibition) [122].

Methodological quality
When the methodological quality of each study was critically appraised based on the Mixed Methods Appraisal Tool criteria (MMAT ; Table 3), a minority were considered to have a strong overall methodological design (indicated by a score of 2 for each checklist item) [44]. Indeed, only six articles had a low risk of bias across all scoring domains [14, 19-21, 25, 32] and the remaining ten articles scored 'unclear' or 'high' risk of bias across one or more domains. Specifically, four studies scored 'high risk' due to concerns that the participants were not representative of the target population (Item 3 † ; either university students or adults who met eligibility criteria without any specific experience) [26,[28][29][30]. Another three studies scored 'unclear risk' for this same checklist item. One study did not clearly describe whether the participants worked in an area that would require +Gz endurance/tolerance [27]; another used participants from the 'acceleration subject panel' and it was unclear if they were representative of the user population (pilots) or if they were simply cleared to participate in the research [45]; a third study used 'experienced centrifuge subjects', but did not specify whether these were military aircrew or past research participants [31].
Similarly, three publications failed to account for confounding variables in the study design and analysis (Item 6 † ). One of these studies assessed the effect marksmanship, but did not control for the shooting ability of the participants in the study [5]; the second study assessed the fit of gloves by participants selecting their preferred size, but did not assess how the selected gloves fit or participant hand anthropometry [34]; and the third study did not standardise fit across the participants [30]. Another study was deemed to have an 'unclear risk' in terms of the measurements being appropriate for the outcome and intervention (Item 4 † ), such that it was not specified if the questionnaire used in the study was validated [19].
Of note, one particular study scored poorly in a number of categories [12]. Specifically, this study was deemed to be at 'high risk' for its qualitative data collection methods, as the exit interviews used may not have been adequate to address the aims of the study (Item 4 � ); 'high risk' for the Item 4 ‡ , due to the small sample size (n = 18); 'high' risk for appropriate statistical analysis to answer the research question (Item 7 ‡ ) as multiple analyses were used without being adjusted for, increasing the chance of error. Similarly, the same study scored 'high risk' for the integration of qualitative and quantitative components (Item 5 § ), as the discussion was very brief and the integration of data was not deemed to be sufficient; 'unclear risk' for divergences and inconsistencies between quantitative and qualitative results, as these were not adequately addressed in the paper (Item 6 § ); and 'high risk' for different components of the study failing to adhere to the quality criteria of each tradition of the methods involved (Item 7 § ), as there was a lack of information regarding how the qualitative data would be analysed.
It is important to note that the MMAT used to rate the methodological quality of these publications did not have an item for participant characteristics.
[44] Several studies failed to report where participants were recruited from (general or specialised population) [26]; the age of participants (either altogether [19] or only reported the age range [30,45]); and even the sex of participants [32], all of which are substantial limitations in interpreting and generalising the study results, but none of which were accounted for in the MMAT quality assessment. A different tool may have more accurately represented the quality of publications by accounting for these missing participant characteristics; however, the MMAT was chosen for its ability to assess both quantitative and qualitative studies (as well as mixed methods studies), all of which appear in the present review.
Generic ROM tasks, which assess dynamic performance, may be insufficient to adequately characterise performance during occupation specific tasks. For example, female soldiers report extreme difficulty assuming prone rifle postures while wearing essential items of PPE, such as body armour and a helmet [113]. As this task involves multiple factors (integration between the helmet and body armour when lifting the head to obtain a sight picture, sufficient strength, and coordination to hold the rifle in position), ROM tasks alone are likely inadequate to characterise performance during this task. Task analysis should be undertaken to determine key operational tasks prior to evaluation of performance in PPE. In some occupational domains, standardised dynamic tasks have already been developed, such as the load effects assessment program (LEAP) used in military settings [123]. In addition, a range of occupation specific tasks have been defined by physical employment standards literature, which are likely to be suitable for research evaluating performance of PPE [124][125][126][127]. Future research should consider the effects of the fit of PPE in their evaluations.
Additionally, given the importance of cognitive performance in a range of occupational settings as discussed in Section 3.4.5, future research evaluating human performance in PPE should aim to include cognitive performance measures (either tasks designed specifically to evaluate a cognitive construct or operational assessments that inherently consider cognition). A number of studies have been conducted evaluating cognitive performance during load carriage [119,[128][129][130][131], but relatively few studies have examined cognitive performance wearing PPE [132,133]. Concomitant considerations when wearing PPE, such as heat stress, have been shown to negatively impact cognitive performance [132][133][134]. As such, incorporating cognitive measures into evaluations of performance while wearing PPE is required to extend the current body of research beyond the physical impacts of PPE on performance. Ensuring that the tasks undertaken in research studies reflect the occupational demands of the users whilst wearing PPE should be a key focus. Importantly, to understand the impacts of PPE on human movement capability, it is necessary to consider the systems that influence the perception-cognition-action cycle. Therefore, research evaluating PPE crosses a range of research fields including physiology, biomechanics, ergonomics, and human factors. A multi-disciplinary approach to evaluating human performance while wearing PPE is recommended in future research.

Effect of PPE fit on performance.
The key findings and recommendations regarding the effect of fit of PPE on performance from the 16 studies included in the present review are summarised in Table 4. As one of the key selection criteria, all of the selected studies included fit as an independent variable; in other words, these studies analysed performance differences while participants wore PPE in varying degrees of fit (e.g. "best fit" compared to "too large" or "too small"). A majority (88%; n = 14) [5,12,14,[19][20][21][25][26][27][28][29][30][31][32] of these studies found that the fit of PPE had a statistically significant effect on occupational performance.
Poorly sized PPE resulted in a range of performance detriments, including slower or increased reaction time [5,25,28,30]; decreased ROM or mobility [12,14,19,20,26]; decreased endurance or tolerance [27,31]; decreased pulmonary function [21]; and altered muscle activation [32]. Given that PPE wear is associated with a human systems integration and mass burden, the results of these studies highlight that negative performance effects can be amplified if PPE is ill-fitting. However, the fit of PPE is a modifiable factor, which can be addressed by improving the metrics to quantify fit and developing the range of PPE sizing accordingly.
Of note, however, are the two studies that did not observe any effect of PPE fit on occupational performance. Drabek et al. 2013 reported that there was no significant difference in response time on a manual dexterity task (peg-board) when participants wore vinyl examination gloves in their preferred size compared to either too small or too large [34]. However, participants reported some degree of ill fit in all sizes, which suggests either that the design or the existing sizing range (small-extra-large) of vinyl examination gloves used in this study was not adequately catering to the target population. McCloskey and Esken also found that integrated night vision goggle helmet fit had no significant impact upon performance in a human-rated centrifuge [45]. These results suggest that the effects of sizing and associated fit of PPE on performance vary by equipment item and task, and therefore research should seek to evaluate all PPE items required for occupational task performance within each respective occupational domain to prioritise the fit of those equipment items that are most detrimental to task performance.

Effect of PPE on performance.
Half of the included studies (n = 8) identified a significant effect of PPE (regardless of fit) on the occupational performance of participants [5,14,21,26,[28][29][30]32], while only two studies reported no significant difference between performance when wearing PPE compared to not wearing the PPE item [25,34]. Both were studies concerned with surgical gloves, and both ultimately reported that use of the selected surgical gloves had no impact on manual dexterity in a healthcare setting when compared to being bare-handed. The finding that PPE affects performance in most studies is not surprising given that PPE is known to impose a mass, bulk, and human systems integration burden [10]. Therefore, strategies to minimise the negative effects of PPE are required. Ensuring PPE is correctly fitted to the anthropometric dimensions of the user is a key strategy to reduce the performance detriments associated with PPE use, as well as promoting dynamic and cognitive fit of the system. The remaining six studies did not assess a baseline condition that allowed for comparison between PPE and non-PPE conditions [12,19,20,27,31,45]. Baseline conditions can be used to identify the magnitude of the PPE impact and subsequently design interventions to reduce the impact, or to support supplementary tool design to reduce PPE impacts, and therefore are worthwhile in future research.

Recommendations for future research
Based on the key findings collated and compared across studies included within this review, as well as the gaps identified in the current body of knowledge, the following section outlines recommendations for future research assessing the effect of PPE fit on functional performance across a range of occupational domains.
3.7.1 Occupational domains. Many occupational domains in which PPE is required and potentially detrimental to operational performance are not represented in the present review due to methodological limitations or study design, and therefore the extent to which PPE fit affects performance in these occupations remains unknown. Future studies are encouraged to examine the fit of PPE across a diverse range of occupational domains, including scientific laboratories, construction, mining, surgery, firefighting, policing, and manufacturing and assembly.

Participants and sampling approach.
A majority of the studies in this review included fewer than 30 participants and several studies recruited a generic sample population (e.g. university students) from a narrow age range. A diversity of participants is essential to ensure that the results are generalisable to the intended end-user population. Therefore, future studies are recommended to recruit a larger sample size from a broad age range, ensuring that the sample is representative of the user population and that it adequately accounts for human variability (e.g. anthropometry, posture, behaviour, etc). Furthermore, females and non-binary individuals remain underrepresented in literature pertaining to protective equipment, despite a bulk of research indicating that females are disproportionately affected by ill-fit and PPErelated performance detriments. It is therefore imperative that further research be inclusive of all potential workers.

Type of PPE.
A range of personal protective equipment is widely used across occupational domains, but no published research has explored the performance detriments associated with fit of boots, protective eyewear, surgical gowns, police body armour, harnesses, exoskeletons, or many other types of PPE. Although they are represented by several studies in the present review, body armour, gloves, helmets, spacesuits, and firefighting uniforms also warrant further investigation in a range of occupational contexts. Future research is encouraged to explore the fit of a range of PPE used in occupational settings, especially considering the fit and integration of related equipment items that may influence PPE fit (e.g. backpacks, oxygen cylinders, weapons, etc). Furthermore, to better cater to the diverse anthropometry of worker populations, future research should endeavour to examine greater ranges of sizes and designs within assessable equipment items. This includes exploring mass customisation and custom PPE approaches to evaluating the effect of PPE on performance in occupational settings.
3.7.4 PPE initial size assessments for fit conditions. In order to determine initial sizing, which is often used as the "best fit" condition within PPE research, future studies are encouraged to employ some combination of participant self-selection, visual inspection by a subject matter expert, and objective criteria to determine initial size, whereby participants self-select a size or quantitative three-dimensional anthropometric shape measures are used to select a size that is then confirmed or possibly modified by the subject matter expert based on standardised criteria to determine the best possible fit from the available sizing range. The specific methods used for initial PPE sizing should be informed by the unique goals of each project; that is, operations affected by comfort of the user within the system will benefit from the inclusion of subjective feedback (i.e. participant self-selection) while research concerned with the specific relation of PPE fit to the user dimensions should rely predominantly on objective anthropometric data.

Fit evaluation.
In evaluating the fit of PPE equipment items, there is a need for further objectivity and standardisation. Future research should also aim to incorporate a combination of static, dynamic, and cognitive assessments of fit, as all three categories have the potential to affect occupational task performance. Specific recommendations for assessing fit within each category are included in the following three sections.
3.7.6 Static fit. It is recommended that future static fit assessments incorporate threedimensional anthropometry in functional postures relevant to operational tasks, as this will yield a more robust quantification of the interactions between the human body and PPE item/ system. Visual inspection by a subject matter expert or subjective user feedback may be used as additional tools to achieve static fit within the PPE item, but it is further recommended that a validated assessment criteria be developed and utilised to improve any static fit assessment. Criteria may also consider user preferences and perception-based fit approaches.
3.7.7 Dynamic fit. Although they have been widely employed in the current literature, dynamic fit assessments can be improved to quantify the three-dimensional interactions more directly between the wearer and the equipment item during a range of functional movements and occupation-specific tasks. Importantly, evaluation of performance during prolonged fieldbased exercises is recommended to ensure ecological validity of study results and to assess of the effects of dynamic fit of PPE on in-situ task performance.

Cognitive fit.
It is recommended that future research include cognitive fit assessments as part of the performance evaluation, which can include indirect assessment through an occupational task that requires perception, attention, memory, and/or problem-solving or direct assessment through a specific task designed to evaluate a particular cognitive construct. A range of cognitive tasks from the field of psychology have been adopted in other performance-based literature, including sporting, aging, and military settings to evaluate cognitive performance and provide exemplar cognitive tasks that can be adopted in future research examining cognitive fit.
3.7.9 Performance while wearing PPE. In order to quantify the effect of PPE fit on performance, key tasks relative to the occupational role should first be undertaken without the equipment item to determine a baseline (i.e. a standard towards which PPE fit and design improvements should aim). With a baseline measurement established, the magnitude of the performance decrement or enhancement when wearing PPE can be determined. The methods by which studies assess user performance while wearing PPE should be specific to the occupational domain and simulate operational conditions where possible. Researchers are also encouraged to work towards standardised tasks (e.g. the load effects assessment program (LEAP) used in military settings), as this will facilitate comparison between studies and a greater understanding of the role of PPE fit on occupational performance detriments.

Limitations of the review
This review is not without limitations. Although a hand search was conducted of the International Conference on Environmental Systems congress proceeding database to identify additional published articles, other grey literature were not included, potentially introducing publication bias and omitting relevant evidence [135,136]. As discussed within the main body of this review, the results suggest the need for new and confirmatory studies that recruit a more representative population; consider sex as a covariate; evaluate PPE fit and performance when integrated with all relevant equipment items; include outcome measures related to all three categories of fit (static, dynamic, cognitive); and assess performance of operationally-relevant tasks.

Conclusion
Across occupational domains, previous research has evaluated the effect of PPE on various types of human performance; however, few studies have considered the role of fit. Of the 16 studies in this review, 88% reported that the fit of PPE had a statistically significant effect on occupational performance. Poorly sized PPE was associated with range of performance detriments, including slower reaction time; decreased ROM or mobility; decreased endurance or tolerance; decreased pulmonary function; and altered muscle activation. However, limited research met the inclusion criteria for this review, which suggests gaps in the current understanding of the impact of PPE fit on operational performance across a range of occupational domains. The included publications had a high risk of overall bias based on methodology quality. Future research should aim to recruit a more representative population; consider sex as a covariate; quantify and evaluate PPE fit and performance when integrated with all relevant equipment items; include outcome measures related to all three categories of fit (static, dynamic, cognitive); and assess performance of operationally relevant tasks.