Towards a conceptual model for the use of home healthcare medical devices: The multi-parameter monitor case

In the last decade there has been an increase in the use of medical devices in the home environment. These devices are commonly the same as those used in hospitals by healthcare professionals. The use of these these devices by lay users outside of a clinical environment may become unsafe. This study presents a methodology that allows decision makers to identify potential risk situations that may arise when lay users operate healthcare medical devices at home. Through a usability study based on the Grounded Theory methodology, we create a conceptual model in which we identified problems and errors related to the use of a multi-parameter monitor in a home environment by a group of lay users. The conceptual model is reified as a graphical representation, which allows stakeholders to identify (i) the weaknesses of the device, (ii) unsafe operation modes, and (iii) the most suitable device for a specific user.


Introduction
A medical device is any instrument, machine, software, or other similar equipment intended to manage human diseases and injuries, support human life or examine specimens, among other uses [1]. The user of a medical device is usually a professional in a healthcare facility. A Home Healthcare Medical Device (HHMD) is a medical device intended for use at home without the need of specialists to operate them and dedicated to improve the patient's quality of life [2].
The use of HHMD is a fast growing sector in the health care industry [3][4][5]. The Global Home Healthcare Market is expected to reach USD 303,605.9 million by 2020, growing at a Compound Annual Growth Rate (CAGR) of 8.1% from 2014 to 2020 [6]. The most significant drivers for this increasing in use are: (1) the increasing aging population [4,7] and their associated chronic diseases. Continua Health Alliance establishes that 860 million people have one or more chronic conditions [8]; and (2) the motivation for health institutions to reduce medical cost by decreasing the length of hospital stays and redirecting early patients to home care [9]. application of human factors engineering (HFE) and usability engineering (UE) concepts to reduce the risk and to ensure "that devices are safe and effective for the intended users, uses, and use environments." The guide focuses on managing the risks associated with the use of medical devices across the HFE/UE concepts, and it is based on a model of the interactions between a user and a device. This model considers: (1) information perceived by the user, (2) cognitive processing, and (3) control actions. The objective of this guide is to help manufacturers to improve the design of the devices.
This work focuses on the new risks that arise when a medical device designed for a clinical environment is used at home. In particular, the objective of this work is to generate a methodology that aids in the understanding of these risks through a case study of the usability of a multi-parameter monitor device in a home care environment. Applying this methodology, we will develop a conceptual model in which the characteristics related to safety and hazardous situations emerge. This conceptual model is based on observations of lay users operating the medical device in their home environment.

Grounded theory
Grounded theory method (GT) is a systematic methodology that involves the discovery of theory through the analysis of data of a given phenomenon. In GT, the conceptual framework (a hierarchical and coherent classification of concepts) emerges through the continuous interaction between data collection and data analysis. That is, a theory emerges inductively from the data [29].
In this work we perform a usability case study using this methodology. The data is gathered through the observation of the use of a multi-parameter monitor (designed for a clinical environment) operated by a non-healthcare professional. The collected data is then analyzed according to the Grounded Theory methodology [30]. This methodology allows to structure and to determine a hierarchy of concepts that emerges from the use of the medical device. For example, through the use of this methodology we will determine that the concept "User does not know if he is satisfied with the selected alarm level because he cannot test it" emerges from volunteer comments such as: "How do I know which sound level is appropriate to me? I cannot hear it", "Will this sound level be enough? Can I test it?" or "I cannot know how loud it is, so I will put a high number to be sure to hear it well".
We focused on identifying problems and errors emerging during the operation of a medical device by lay users in order to obtain a conceptual model of the use of medical devices in a homecare environment. This study uses a medical device and not a HHMD, because in practice medical devices originally designed for clinical use are increasingly being used in the home environment [9]. Testing in the real operational environment-homecare with the device operated by a lay user-can reveal problems that have not been anticipated either by the standards or designers [14].
This methodology involves three phases: sampling, data collection, and data analysis [31]. The development of each phase proceeds iteratively and simultaneously, and are described below. The phases, processes, and steps of the Grounded Theory methodology are summarized in Fig 1. Sampling phase. The sampling phase is performed following three processes [31]: convenient, snowball, and theoretical sampling.
First, during convenient sampling; we choose a set of representative individuals, selecting volunteers that have different characteristics between them, with the objective of encompassing a wide range of user profiles.
Next, using snowball sampling; new individuals who present at least one of the characteristics established as relevant are selected. Relevant characteristics are those most often repeated among the volunteers that presented a high number of problems during convenient sampling.
Finally, when the first signs of data saturation are revealed, the theoretical sampling stage starts. During this stage, data are collected, coded and analyzed in order to generate a theory. Saturation of data is the point at which no new information is extracted from the data. [32]. In GT, it is unnecessary to define a number of samples before starting the study, since this number is established when the data analysis shows a tendency to the reduction of new results (ideally close to zero). This is known as data saturation, and when the data meets this condition, the sampling number is reached.
Data collection phase. The data collection phase aims to acquire all the information required to identify objective and subjective characteristics relative to the phenomenon to be studied. This phase comprises three stages [31]: The first stage corresponds to analysis of documents relative to the phenomenon. The outcome of this stage is a set of activities to be performed by the subjects and a questionnaire to be used in the next stage. This questionnaire allows the evaluation of key factors that could impact the performance of the subjects during the activities. The second stage corresponds to an interview with the subjects based on the questionnaire design in the previous stage, in order to gather prior information relevant to the investigation. Finally, the third stage corresponds to the observation of the subjects by the researcher while they carry out the activities defined above.
Data analysis phase. Data analysis corresponds to the last phase of the methodology and includes two coding steps [31,33]. During the coding process, the transcribed data are  organized into categories and then similar categories are grouped together to organize the data for the conceptual model. The process seeks to find topics, concepts, patterns, relationships and similarities through the constant comparison method [34]. We use the GT open coding and axial coding techniques to analyze the data. During open coding, categories are developed. This involves the identification of chunks or units of data (e.g. key words, phrases, sentences) that belong to or represent a more general phenomenon [35]. In this work, we call these units of data "segments of interest", which correspond to the difficulties identified by the researcher during the observation of use. We cluster the segments of interest that represent similar situations into RDUs (Representative Data Unit) [35][36][37]. During axial coding, the categories which have emerged from open coding are clustered in order to organize them in a logical and simplified way [31]. At this stage, we seek to understand and establish different emerging relationships in order to group categories that show a central association. The categories obtained from the conceptual model are checked for exclusively (i.e. that there is no overlap between them) [34].

Participants
Thirteen participants volunteered to take part in this study. The participants showed varying characteristics, encompassing a wide range of potential users of HHMD. The study was approved by the ethics committee of the Universidad de Valparaíso. Each volunteer provided written informed consent and gave permission to publish the obtained data before the session. During each session, the researcher studied the interaction of each volunteer with a multiparameter monitor, performing a set of predefined tasks.

Multi-parameter monitor
The medical device under study was a multi-parameter monitor (Mindray model MEC-1200), which measures five vital signs of the human body: blood pressure, heart rate, SpO2 (saturation of peripheral oxygen), respiration rate, and body temperature. The patient's parameters are displayed in real time on the screen of the device. We selected this device because monitoring vital signs continuously and simultaneously is relevant to homecare, since there are patients with chronic diseases who do not need to remain hospitalized, but need to constantly monitor their vital signs to prevent complications of chronic diseases [38]. In addition, the use of devices that enable people to manage their health care in a more convenient and independent way is an increasing tendency [13,39].

Instantiation of the grounded theory
Firstly, the researcher performed the analysis of the device's manual before initiating the sessions with the volunteers. Here, information was extracted regarding the technical and clinical characteristics of the medical device, such as the variables measured by the device, the details of the user interface, and its clinical use. Interviews with the volunteers were then performed. The interview was based on the selection of factors that may lead to an unsafe use, as described in the literature [9, 11-13, 15, 39]. These factors include: experience in the use of medical devices, technical and medical knowledge of the health condition, physical capacities (e.g. strength), cognitive capacities (e.g. concentration and memorization) and sensory capacities (e.g. vision). Characteristics which require medical diagnoses (e.g. sensory capabilities) were based on information declared by the volunteer. New questions were incorporated into the interview based on the observations of the first convenient sampling sessions. This allowed the incorporation of features that had not been considered initially, and that could influence the safety of the use of the device. These new characteristics were evaluated immediately in the next interviews. In this study, three new characteristics were added after the first convenient sampling session: "mathematical knowledge", "knowledge of technical symbols", and "reaction to adverse events". Finally, the observation of the use of the medical device was performed during each session. Participants followed a guide which was based on the functions described in the device manual. This guide defined 26 task scenarios designed for evaluating the performance of each volunteer in the use of the medical device. Here, the researcher obtained information of the use of the device. For example, tasks established for this study were "Enter a new patient dataset", "Set proper alarm volume and display alarm limits", and "Adjust respiration: set the alarm on, set apnea alarm to 20 seconds, establish the calculation type to automatic threshold; connect the sensor to the device".
Each session began with an introductory period in which the researcher explained the purpose of the study and the procedure of the session. Then, a training period was carried out on the use of the multi-parameter monitor. After the training period, the volunteer was asked to perform tasks established on the task guide in order to evaluate the use of the medical device. The researcher asked the participants to "think aloud" during the session to facilitate the identification of problems. While the users performed the tasks, the researcher observed and took notes of participant actions, problems, and errors.
At the end of the tasks, and in a dynamic governed by Gordon Pask's teachback method [40], the key elements that were registered during the session were repeated to the volunteer orally to verify that they capture the idea of the situations presented.
During the coding process, the researcher identified segments of interest of the transcriptions by applying open coding. Segments of interest were clustered to represent similar situations into RDUs. For example, segments such as "How do I know which sound level is appropriate to me? I cannot hear it" or "I cannot know how loud it is, so I will put a high number to be sure to hear it well" are associated to the RDU "User does not know if he is satisfied with the selected alarm level because he cannot test it".
The relationships between the elements of the conceptual model are illustrated in Fig 2. In this elicitation process, we created the RDUs from the transcripts using open coding, and created concepts from the RDUs using axial coding. Finally, concepts were grouped to create categories using axial coding. These categories constituted the conceptual model. The detection of relationships and similarities was supported by the constant comparison method [34]. The detected RDUs were encoded to represent the phenomenon. For this, we identified the error or problem which occurred in each RDU, and combined it with similar ones under a sentence that represents the problem; each cluster is called "concept" (CON). Several RDUs can be identified with the same concept [35][36][37]. For example, RDUs obtained in the study such as "User does not know the meaning or function of an option" and "User misunderstands the meaning or function of an option" are associated to the concept "User does not understand the function or meaning of an option or term that involves unfamiliar technical terms." The initial concepts were compared iteratively with each other and with the new arising concepts, focusing on finding similarities and differences, in order to create clusters of concepts. The similarities and differences were related to the cause of error or problem. As such, the concepts that present similar causes were grouped under a sentence that represents the cause; each cluster is called "category" (CAT). Each category was associated to a characteristics acquired from the device (during the manual analysis stage), user, or environment (during the interview and observation stage). The initial interpretations were modified and refined iteratively by comparing them with other segments of data and other transcripts [35][36][37]. For example, the category "User's initiative to read the manual or follow the instructions" emerges from the following observed situations: "User focuses on the end of the task, not in the process, it is explained in the manual but he does not read it" and "User observes the options but does not identify the correct one, the information is in the manual but he does not read it." The researcher checked the categories obtained for the conceptual model, ensuring that there were no overlaps between them. It was also checked that the set of categories covered the whole spectrum of RDUs by applying constant comparison method. Afterwards, the researcher linked the created categories in order to organize them in a logical and simplified way. In this study, categories were organized in groups (according to whether the cause of error was due to the user or the device) and subgroups. In the subgroups, the categories which represented similar causes of problems were linked in a unique area. There were four areas (subgroups) created: semantics, perception, information, and manipulation.
When the saturation of the data was reached, the data analysis phase was completed, and the conceptual model created. The conceptual model represents all the categories obtained, organized into groups and areas. We represented the conceptual model by a graphical representation. This representation allows decision makers to evaluate each of the generated categories. We evaluated each category using a Likert-type scale from 1 to 5, where 1 indicated that it is not satisfied, and 5 indicated that the category fully met the established characteristic.

Results
A total of 13 volunteers and one expert in the use of medical devices were interviewed (7 volunteers using convenient sampling, 4 volunteers using snowball sampling and 2 volunteers using theoretical sampling). This number of participants is in line with the numbers typically reported in the literature, and although in GT the necessary number of sessions for the sampling phase and for achieving saturation is uncertain, Guest et al. [32] establish that saturation is usually achieved within the first 12 interviews. We included an expert (nurse with experience in the use of multi-parameter monitors) to verify that the proposed methodology allows the detection of risks of use of medical devices by inexperienced users.
The convenient sampling phase included volunteers with different characteristics, such as a wide range of ages (23 to 67 years) and occupations (e.g. professionals, students, driver, and housewife), both genders, different levels of knowledge of technical and medical issues, with or without sensory impairment, chronic condition, or multiple language speaking, among others (see S1 Table for details). Although there were volunteers who knew this medical device, none of the volunteers had manipulated one before. Particularly, seven sessions were designed in order to detect the most interesting volunteers' characteristics. After these sessions, we observed the relevant characteristics of the volunteers (that is, the user characteristics most often repeated among the volunteers that presented a high number of problems). The volunteers' relevant characteristics were: people not working in the health sector, who also had some sensory impairment or chronic condition, as well as low knowledge of technical and medical issues.
The snowball sampling phase was then initiated, including new volunteers with the relevant characteristics obtained in the convenient sampling process. The first signs of saturation were detected with the third volunteer of this sampling. Then, two sessions of theoretical sampling were needed to confirm saturation. In conclusion, the last four volunteers presented a number of new RDUs close to zero, which is the criterion to determine saturation. Fig 3 shows the saturation data graphically. We can observe that the saturation was reached within the first 12 volunteers.
During the data analysis phase, around 300 segments of interest were identified from the transcriptions of the sessions with the volunteers. From these transcriptions emerged 233 RDUs (including repeated RDUs among different volunteers). By eliminating duplicates, the list was reduced to 110 RDUs (see S2 Table for the complete list). We found that the number of RDUs did not correlated with the level of technical and medical knowledge of the volunteers. The expert user, despite not having previously manipulated the same model and brand of the device, presented a low number of difficulties (only three UDR were detected, mainly attributable to not having read the device's manual). The most frequent RDUs are shown below. In parenthesis we indicate the percentage of participants who exhibit these RDUs.
• User does not know a medical abbreviation preventing her from understanding a task (92.3%).
• User reads the label but does not know which sensor to use to measure a specific parameter (84.6%). • User cannot explain correctly the information of a table (76.9%).
• User does not know if she is satisfied with the selected alarm loudness because she cannot test the volume (76.9%).
All the obtained RDUs were classified in 73 concepts (types of errors or problems). The most common concepts are shown below. In parenthesis we indicate the percentage of participants who were associated with these concepts.
• User focuses on the end of the task, not in the process; this is explained in the manual but he does not read it (92.3%).
• User observes the options but does not identify the correct one, the information is in the manual but he does not read it (84.6%).
• User does not know if he is comfortable with his choice due to lack of feedback (76.9%).
Then, after a constant comparison method, we detected similarities and differences between concepts. The concepts were grouped into 30 categories (causes of error or problem). The most common categories are shown below.
• The language of the device is familiar to the user.
• User's knowledge of technical terms of the medical device.
• User's knowledge of medical abbreviations.
• User's initiative to read the manual or to follow the instructions. Table 1 shows part of the obtained RDUs, concepts and categories. Finally, we grouped categories into areas (according to the factor associated to the cause of error). Categories presenting similar causes of problem were linked to a unique area. We determined four areas: semantics, perception, information, and manipulation.

Semantics:
This area refers to the meaning of words. It includes concepts that are related to the gap between the terms used in the device and the terms known by the user.

Perception:
It corresponds to what the user perceives. It includes concepts that are related to the gap between what the user actually senses and what he should sense to use the device correctly.
Information: it refers to the information that the user can use and his understanding. It includes concepts related to the gap between the quality of information that the user has and the information that he needs to use the device correctly.
Manipulation: this area refers to the physical manipulation of the device and its accessories. It includes concepts related to the gap between manipulating the device's controls effectively, buttons and accessories and errors in their manipulation.
The conceptual model was created when saturation was reached (no more concepts created) and all categories were linked into areas. Each of these areas was classified into two groups: user, and device. Fig 4 shows categories and areas obtained through GT process.
We reify the conceptual model as a graphical representation (Fig 4). The graphical representation allows stakeholders to evaluate at glance each of the 30 categories by group and areas. The numerical evaluation was made by the researcher based on the number of RDUs that were found during the study. The score ranges from 1 (the category is not satisfied) to 5 (the category is fully satisfied). The score for each category was obtained from the number of RDUs associated with that category: The category score was inversely proportional to the number of RDUs found associated with each category and relative to the total number of RDUs that were found in the study. For example, category "comfortable contrast of the display's colours" is well evaluated because it was associated with only 2 RDUs out of the 233. The final score was computed by mapping the obtained proportion to a Likert scale.

Discussion
The proposed model highlights groups, areas, and categories that could lead to the unsafe operation of the device under study (low scores in Fig 4). The groups, areas, and categories that do not present risk of unsafe use are the ones that show higher scores (better evaluation) in Fig 4. This type of evaluation should focus the designer of the device on groups and categories relevant for designing a HHMD version of a multi-parameter monitor. This evaluation should also be taken into account for the training lay user in the use of the device. This form of analysis can be used to analyze which group, area, or specific categories are most likely to generate a problem. In this example, the subgroup "information" is more likely to generate a problem, followed by the group "semantics". As a consequence, those areas should be reinforced to reduce the probability of unsafe operation.

RDU Examples Concept Examples Categories
User does not know a medical abbreviation, which prevents him from understanding the situation.
User does not know, does not relate to, or confuses a medical abbreviation indicated in the device or manual.
Device use of language familiar to the user.
User does not know which parameter to modify to display the required data.
User is not familiar with the parameters shown in the interface.
User does not know which option to choose, he does not know which one fits to his profile.
User does not associate the task with the options he has.
User's knowledge of technical terms of the medical device.
User does not know how to approach the information in the manual. User does not know in which mode the device is working, it is not indicated.
User does not know in which state or mode the device is in for lack of feedback.
User does not know how to start or cancel a measure.
User observes the options but does not identify the correct one, the information is in the manual but he does not read it.
User's initiative to read the manual or follow the instructions.
User misses a step of the task. User focuses on the end of the task, not in the process, it is explained in the manual but he does not read it.
User does not notice the visual signals indicating the electrical supply is shut down.
The visual signals triggered by an adverse event do not alert the user.
The model helps to specify areas which represent (a) the weaknesses of the device under study (either in the device interface, enclosure, or user manual); and (b) the weaknesses of the user. These weaknesses may lead to errors and therefore, to an unsafe operation. As consequence, these areas must be strengthened.
The methodology proposed in this work may be applied to other devices, and enable medical equipment manufacturers to detect weaknesses in the usability of their devices and instruction manuals. Furthermore, the methodology could allow health care professionals to identify the most appropriate device for a patient by visualizing the gap between a safe operation of the device and the operation that the user will likely perform.
In order to use this methodology, we investigated volunteers with no experience in the use of the medical devices, regardless of their different technical and medical knowledge. We also found that an expert in the use of such a device produced too few RDUs to create a conceptual model. Therefore, it may not be possible to generate a conceptual model using experts.
In relation to the "Applying Human Factors and Usability Engineering to Medical Devices" FDA guide [28], we believe that our methodology has a greater emphasis on the analysis of device usage by users, and provides a step-by-step method for obtaining and evaluating risk characteristics grounded on the use of the device by lay users.

Conclusion
This work presents a usability study to detect problems and errors related to the use of a medical device by lay users in their home environment. The collected data was analyzed according to the Grounded Theory methodology. This analysis produced a conceptual model. Finally, the conceptual model was reified as a graphical representation.
The conceptual model and its graphical representation allow stakeholders to visualize areas of interest and categories that can lead to an unsafe operation of the medical device under study. Also, following the proposed methodology, the manufacturers could detect the principal factors that may lead to an unsafe operation of their devices, and allows healthcare professionals to determine the most appropriate device for a patient in terms of its safety. It may also be useful to identify weak aspects of the device operation, and to complement these aspects with clearer instructions. As such, the methodology used for this research is appropriate for studying the usability of a medical device to be used by a lay user in a nonclinical environment.
Supporting information S1