This study set out to analyze questions about type 2 diabetes mellitus (T2DM) from patients and the public. The aim was to better understand people’s information needs by starting with what they do not know, discovered through their own questions, rather than starting with what we know about T2DM and subsequently finding ways to communicate that information to people affected by or at risk of the disease. One hundred and sixty-four questions were collected from 120 patients attending outpatient diabetes clinics and 300 questions from 100 members of the public through the Amazon Mechanical Turk crowdsourcing platform. Twenty-three general and diabetes-specific topics and five phases of disease progression were identified; these were used to manually categorize the questions. Analyses were performed to determine which topics, if any, were significant predictors of a question’s being asked by a patient or the public, and similarly for questions from a woman or a man. Further analysis identified the individual topics that were assigned significantly more often to the crowdsourced or clinic questions. These were Causes (CI: [-0.07, -0.03], p < .001), Risk Factors ([-0.08, -0.03], p < .001), Prevention ([-0.06, -0.02], p < .001), Diagnosis ([-0.05, -0.02], p < .001), and Distribution of a Disease in a Population ([-0.05,-0.01], p = .0016) for the crowdsourced questions and Treatment ([0.03, 0.01], p = .0019), Disease Complications ([0.02, 0.07], p < .001), and Psychosocial ([0.05, 0.1], p < .001) for the clinic questions. No highly significant gender-specific topics emerged in our study, but questions about Weight were more likely to come from women and Psychosocial questions from men. There were significantly more crowdsourced questions about the time Prior to any Diagnosis ([(-0.11, -0.04], p = .0013) and significantly more clinic questions about Health Maintenance and Prevention after diagnosis ([0.07. 0.17], p < .001). A descriptive analysis pointed to the value provided by the specificity of questions, their potential to disclose emotions behind questions, and the as-yet unrecognized information needs they can reveal. Large-scale collection of questions from patients across the spectrum of T2DM progression and from the public–a significant percentage of whom are likely to be as yet undiagnosed–is expected to yield further valuable insights.
Citation: Crangle CE, Bradley C, Carlin PF, Esterhay RJ, Harper R, Kearney PM, et al. (2018) Exploring patient information needs in type 2 diabetes: A cross sectional study of questions. PLoS ONE 13(11): e0203429. https://doi.org/10.1371/journal.pone.0203429
Editor: Stephen L. Atkin, Weill Cornell Medical College Qatar, QATAR
Received: October 12, 2016; Accepted: August 21, 2018; Published: November 16, 2018
Copyright: © 2018 Crangle et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: Data underlying the study is available on Figshare (DOI: 10.6084/m9.figshare.7038584).
Competing interests: MST has received travel support from Apelon, Inc. as a member of its Board of Directors for work unrelated to this study. MST does not receive salary support from Apelon, Inc. CEC has received research support from the National Library of Medicine, NIH, as a principal in Converspeech LLC, for work unrelated to this study and she has received salary support from Converspeech LLC for work unrelated to this study. CEC is a Staff Editor at PLOS Medicine. These interests do not alter our adherence to PLOS ONE policies on sharing data and materials.
Diabetes is a major health problem worldwide. The prevalence of global, age-standardized diabetes is 9% in men and 7.9% in women, with the number having risen around the globe from 108 million in 1980 to 422 million in 2014 . Type 2 diabetes mellitus (T2DM) is a main driver of the increase, accounting for approximately 90% of all diabetes cases [2–4]. Diabetes is a complex condition and people with diabetes have a diverse range of information needs [5–8]. Large-scale investigations such as the DAWN studies on the attitudes, wishes and needs of patients and caregivers [9, 10] have told us much, but research to date has paid little attention to exploring the information needs of patients as expressed in the questions they have about diabetes. Questions convey information needs in the patient’s own voice and permit the individual and subjective experience of illness to be captured . To our knowledge, no-one to date has investigated on a large scale what T2DM patients want to know at different stages of diagnosis and treatment by asking them directly what their questions are, nor have questions from the public been solicited and examined.
Our study concerns a new way of thinking about patient information needs in diabetes, starting not with what we know about T2DM and finding ways to communicate that information to patients but starting with what patients do not know, discovered through their own questions. Soliciting, and then responding, to patient questions on a large scale has the potential to create a new information resource for T2DM, both in terms of content and organization. A questions-based approach to patient knowledge is distinct from active information seeking through which the patient searches extant information resources , and it is distinct from passive information receipt in which the patient is exposed either accidentally or deliberately to extant information resources . A questions-based approach has the potential to create a dynamic, continually updated resource that will capture patient information needs as they evolve over time.
It is estimated that more than half of American adults have either T2DM or prediabetes (as measured by blood sugar levels or determined by diagnosis) and of those more than one-third are unaware they have the disease . Consequently, it is crucial that we understand the information needs and voice of those who do not have diabetes, or do not know they have diabetes, but still have questions whether out of curiosity or concern for themselves or a loved one. In this paper, we report on the first stage of our work soliciting questions directly from both patients and the general public and analyzing the questions to see what they reveal.
Questions play a vital role in health care. Patient questions foster good communication with health professionals, resulting in better care and the right care at the right time [14–17]. However, poor bi-directional flow of information between the diabetes health professional and the patient has been documented. Discrepancies have also been noted between information provided by health care providers and what patients with diabetes need . Patients often cannot get as much detail as they need during office visits . Time constraints, whether actual or perceived, prevent some patients from asking questions during the consultation [5, 20]. Patients also find it difficult to retain much of what they have been told by a health professional, and what they do remember is incorrect almost half the time [21–23].
Clinical information needs have been extensively studied by collecting questions from physicians and analyzing them [24–44]. For patients and the general population, the situation is very different. Only recently have their health questions been studied in any depth [45–51], with few studies, to our knowledge, focusing on diabetes or investigating differences between questions from patients and those not in a patient setting. Our recent study has shown that available online sources of information do not provide answers to patient questions about diabetes and that there is an urgent need to better understand these information needs . In this study, therefore, we set out to collect and investigate questions about diabetes from two sources, namely, patients attending a diabetes clinic and the public through crowdsourcing. We hypothesized that an analysis of the questions in terms of the topics they cover and the phases of disease progression they concern would provide important insights, potentially also revealing differences in information needs between patients and those outside the patient setting, who may or may not have diabetes or may be unaware they have the disease.
This study makes secondary use of anonymized data. A prior service evaluation had been approved within the South Eastern Health and Social Care Trust, Northern Ireland, to assess patient information needs by approaching patients attending the diabetes clinic and asking them to provide questions. They were free to refuse if they wished to. No participant consent was needed for the service evaluation and none was sought. Questions were recorded on a sheet provided to each patient if interested, with no identifiers such as clinic time, clinician or personal information collected. No ethics committee approval was needed for our secondary analysis of the collected questions. This practice conforms to the guidelines of the Health Research Authority of the UK National Health Service and current UK legislative and good practice arrangements. The authors had no direct contact with the participants and there were no minors among the participants.
As part of a prior service evaluation, all patients attending the weekly diabetes outpatient clinic at the Ulster Hospital in Northern Ireland during February to April, 2014 had been invited to submit questions by responding to the following: What are the one or two most pressing questions about your diabetes that you would like answered? Patients were provided with a blank page to record their questions and questions from the same individual were marked as such.
We obtained additional questions using the crowdsourcing platform of Amazon Mechanical Turk (AMT). Crowdsourcing has become an important part of many clinical studies , with new platforms emerging to meet the particular requirements of research . One hundred AMT participants were asked to each enter three questions s/he had about diabetes. Each participant was asked to specify age, sex/gender and if s/he had a diagnosis of type 2 or type 1 diabetes, or a diagnosis of diabetes but did not know the type, and if s/he had a friend or family member with a diagnosis of type 2, type 1 or unknown type. Crowdsourced question collection took place on July 8th -11th, 2015.
Categorization by topic and phase
Question content was determined through fine-grained manual categorization of the topics and the phases of diabetes progression the question referred to. Such detailed assessment of need is part of the move towards better disease management through understanding the likely information needs of different subgroups of people at different phases of the disease, at the onset of diabetes, for example, or later when a new complication has developed.
An initial set of 13 topics, based on known concerns of patients with diabetes [5–10, 55–57] and our prior work on consumer questions , was compiled and used to conduct a preliminary categorization of the crowdsourced questions. This undertaking led to an expanded set of 23 topics for use in this study. We additionally compiled a five-part patient-oriented classification of the phases of T2DM drawing on prior work and our clinical experience [58–67].
Two researchers independently categorized each question by topic and phase. A question could fall under more than one topic and more than one phase, but the phases had to be consecutive, as in the range 3–5, for instance. There were therefore more question-topic assignments and more question-phase assignments than there were questions.
Coding was performed by CEC (all crowdsourced questions), PK (half the crowdsourced questions), VMC (half the crowdsourced questions), and PC and RH (all clinic questions each). PK and RH are clinicians, VMC and PC healthcare researchers, and CEC a non-clinical bioinformatics researcher. For each question and topic, a score of 1 indicated that the question fell under that topic and a score of 0 that it did not. If both coders scored 1 or 0 for a question and topic, it was counted as agreement. Agreement for phase was determined by an overlap between one coder and the other. Intercoder reliability was computed using Cohen’s kappa with the following guidelines from : slight agreement (0–0.2); fair (0.21–0.4); moderate (0.41–0.6); substantial (0.61–0.8); almost perfect (0.81–1). Disagreement between coders was resolved through consensus review by the coders and members of the project team.
The following analyses were performed for topics and stratified by sex for the crowdsourced questions. The significance threshold was set at .05 except where indicated.
Because consecutive questions are more likely to stem from the same questioner in each corpus, the samples cannot be assumed to be independent. We therefore determined which, if any, individuals had highly correlated questions in terms of their topic assignments using the Pearson correlation coefficient. Then, following the guideline that multicollinearity may be a problem in a data set if any pairwise |r| > 0.7 , we removed the questions from any individual who had strongly correlated questions (|r| > 0.7 for any pair of his/her questions). For each corpus, we also examined all pairwise correlations between topics, in terms of the questions assigned to them, removing those topics, if any, that were strongly correlated.
To determine which topics, if any, were significant predictors of a question’s coming from a patient in the clinic or from the public through crowdsourcing, we used Lasso regression with the Least Angle Regression (LARS) algorithm [70,71], Lasso-LARS is a model selection algorithm that uses repeated internal cross-validation to select variables and estimate coefficients in the presence of collinearity. We applied Lasso-LARS both before and after removal of highly correlated questions and topics. Computations were performed using the LassoLarsCV function from the scikit-learn python package with 10-fold cross validation and default parameters . Lasso-LARS regression was also performed on the crowdsourced questions to determine which, if any, topics were significant predictors of a question’s coming from a woman or a man.
We also examined each topic individually to determine if it was assigned significantly more often to the clinic or the crowdsourced questions, correcting for multiple comparisons using the Benjamini–Hochberg false discovery rate (FDR) [73–75]. For the crowdsourced questions only, we similarly asked for each topic if it was assigned significantly more often to the questions asked by men or those asked by women. The 2-tailed z-test provided 95% confidence intervals (CI) for these estimates. This analysis told us something about the topics, in contrast to the Lasso-LARS analysis that told us something about the questions and the people asking them. The z-tests were performed after confirmation that the distribution of questions over topics was approximately normal. That is, we confirmed that the number of questions per topic was approximately normally distributed for both the crowdsourced and clinic questions under the Shapiro-Wilk test, both before and after removal of the correlated questions, and similarly for the female and male questions . For the phases of disease progression, a similar analysis was done to determine which phases, if any, were assigned significantly more often to the clinic or the crowdsourced questions.
To gain additional understanding of the differences between the clinic and crowdsourced questions, the top three (85th percentile) and top five (75% percentile) topics in terms of the number of questions to which they were assigned were identified for each corpus. Those that were top in one corpus and not the other were recognized as characteristic of that corpus. A similar analysis was done for the phases of disease progression.
In addition to topic analysis and the analysis by phase of disease progression, the combined corpus of questions was reviewed from a holistic and descriptive perspective to ascertain any inferences implicit in the questions that might reveal underlying concerns or issues for the person generating the question. It was apparent that the questioners, not all of whom had diabetes, were seeking more than just factual information. A limited qualitative analysis of the combined corpus was therefore undertaken to address this need for a broader interpretation of the questions beyond their literal content. This analysis was not exhaustive but illustrative, identifying themes that might inform a detailed analysis of a larger collection of questions.
A preliminary categorization of the crowdsourced questions using a core set of categories derived from earlier work [5–10, 16,55–57] produced a Cohen’s kappa score of 0.61 overall, which represents moderate to substantial agreement . A subsequent round-table discussion by members of the project team (CEC, PK, VMC, MFM, ES, JGW and PC) led to the formulation of the 23 categories described in Table 1. Several diabetes-specific categories, namely Lifestyle / Behavior Change (hereafter abbreviated simply to Lifestyle), Exercise, Diet, Weight, and Cure or Reversal, were added to the core categories. For this last topic, we note that a more clinically oriented topic descriptor would be Control or Remission. However, our experience to date with patient and general-public questions is that the lay perception centers on the idea of completely getting rid of a disease and for this reason we use the descriptor Cure or Reversal. The topic of Complications derived from earlier work was split into Disease Complications and Treatment Complications to properly represent the types of questions found for T2DM.
One hundred and sixty-four questions were collected from 120 patients during 12 outpatient clinics. Most of the questions were about diabetes (N = 155) with the remainder related to clinic operation (N = 9). Of the questions on diabetes, 152 from 101 patients were about T2DM and these questions were retained for our analysis. Although only 1 to 2 questions were asked for, 2 patients gave 3 questions each. Most of the patients attending the clinic had T2DM (95%). These questions are given in S1 File.
For the crowdsourced questions, 100 AMT participants each contributed three questions about diabetes (N = 300). Most of the questions were about T2DM (N = 284) with a smaller number related to type 1 diabetes (N = 15) and 1 question duplicated by one of the questioners. Of the 100 questioners (F 34, M 66), 9 had diabetes (6 type 2, 2 type 1, one unknown type) and 91 a friend or family member with diabetes (30 type 2, 17 type 1, 44 unknown type). The 284 questions about T2DM were retained for analysis. These questions are given in S2 File. For the clinic questions overall, agreement between the coders was substantial (Cohen’s kappa = 0.77, SD = 0.18). For the crowdsourced questions overall, agreement between the coders was almost perfect (Cohen’s kappa = 0.86 SD = 0.1). Disagreements were resolved by consensus between the coders.
We found that for the crowdsourced questions, 16 of the 100 individuals had strongly correlated questions (|r| > 0.7 for any pair of their 2 or 3 questions) and, for the clinic questions, 3 of the 101 patients. After removing the questions from the identified individuals there were 236 crowdsourced questions from 84 individuals and 147 clinic questions from 98 patients remaining. We did not find the topics in the crowdsourced questions to be correlated at the 0.7 criterion value, but for the clinic questions, Transmission Patterns was correlated with Inheritance Patterns at the 0.7 level. We therefore dropped the category Transmission Patterns, which had only 2 questions in the crowdsourced corpus and 5 in the clinic corpus, all of which also fell under other topic categories and consequently did not need to be removed.
Topics and the clinic and crowdsourced questions
The clinic questions had an average of 2.8 topics per question (min 1, max 7) and the crowdsourced questions had an average of 2.1 topics per question (min 1, max 5). The results of the Lasso-LARS regression on all questions showed slightly higher odds ratios in favor of questions that were Own Health Record Related and about Treatment coming from the clinic patients (1.143 and 1.114 respectively). The odds ratios for all other questions were less than 1.062. The optimum alpha value found was 0.0009 with a mean squared error of 0.151 for both the training and test data. Lasso-LARS regression on only the non-correlated questions revealed similar slightly higher odds ratios in favor of the clinic questions for the same two topics (1.173 and 1.156 for Own Health Record Related and Treatment, respectively), with all other odds ratios less than 1.1. The optimum alpha value was 0.0006 with a mean squared error for the training data of 0.132 and 0.187 for the test data.
In terms of the individual topics, topics that were assigned significantly more often to the crowdsourced than the clinic questions were Causes (CI: [-0.07, -0.03], p < .001), Risk Factors ([-0.08, -0.03], p < .001), Prevention ([-0.06, -0.02], p < .001), Diagnosis ([-0.05, -0.02], p < .001), and Distribution of a Disease in a Population ([-0.05,-0.01], p = .0016). In contrast, the topics Treatment ([0.03, 0.01], p = .0019), Disease Complications ([0.02, 0.07], p < .001), and Psychosocial ([0.05, 0.1], p < .001) were assigned significantly more often to the clinic questions. See Table 2.
The three most frequent clinic topics (Table 2) were Treatment (91 questions), Weight (56) and Psychosocial (47). The three most frequent crowdsourced topics also included Weight (65) and Treatment (61), but included Risk (64) rather than Psychosocial. The topic Psychosocial therefore characterizes the clinic questions and the topic Risk Factors the crowdsourced questions. The next two clinic topics were Manifestations (43) and Lifestyle (43), which were not in the top crowdsourced topics, and therefore further characterize the clinic questions. The next two crowdsourced topics were Causes (49) and Cure / Reversal (47), which were not in the top clinic questions, and therefore further characterize the crowdsourced questions.
For the crowdsourced questions, Lasso-LARS regression showed a slightly higher odds ratio (1.122) for a question about Weight coming from a woman rather than a man. The optimum alpha value found was 0.0024 and the mean squared error for the training data was 0.193 and 0.235 for the test data.
In terms of the individual topics, the only topic that was more strongly associated with one gender over another was Psychosocial (CI: [0.02, 0.05], p = .1497), which was more strongly represented in the male questions, but only at the 0.2 level. For both men and women, the topics Lifestyle (women 24 questions; men 41 questions), Risk Factors (24; 41), and Diet (19; 28) were most frequently assigned. There was one topic appearing uniquely in the top five for men and one for women that might in addition be thought of as characterizing the two groups. These were Manifestations (women 18 question) and Prognosis (men 24 questions).
The phases of disease progression
The five phases of T2DM progression we identified from the literature and our experience were: Prior to any Diagnosis; Pre-diabetic (diagnosed); Onset of T2DM; Health Maintenance and Prevention; Complications–Minor (onset) or Major (dominance). These are listed in Table 3 along with a description of each phase.
For both the clinic and consumer questions, intercoder reliability for categorization by phase was moderate (k = 0.64 clinic; k = 0.67 crowdsourced). Given the exploratory nature of this categorization by phase, by consensus the coders agreed to assign a phase category to a question if either one of the coders did so. In this way, the judgements of all coders (clinical and non-clinical) could be taken into account.
Phases and the clinic and crowdsourced questions
There were significantly more crowdsourced questions that concerned Phase 1, the time Prior to any Diagnosis (CI: [-0.11, -0.04], p = .0013), and significantly more clinic questions about Phase 4, Health Maintenance and Prevention (CI: [0.07. 0.17], p < .001). See Table 4.
The most frequently applied phase in the clinic questions was Phase 4 (Health Maintenance and Prevention, 122 questions). This was followed by Phases 5 and 3, then Phases 2 and 1. The most frequently applied phase in the crowdsourced questions was Phase 3 (Onset of T2DM, 220 questions) followed by Phases 4 and 5, then Phases 2 and 1. Phase assignment numbers for clinic and crowdsourced questions are shown in Fig 1.
Our descriptive analysis identified four themes to pursue in future studies: (1) the specificity found in questions; (2) questions revealing the emotion behind an information need; (3) questions disclosing information needs not yet recognized in standard patient information resources; and (4) the potential for questions to identify specific constituent groups with their own information needs.
(1) The specificity of questions.
Questions encourage specificity. The topic “diabetes and prognosis,” for instance, does not capture the specificity of the following four prognosis-related questions taken from our corpora:
- Is diabetes a death sentence?
- Will all Type 2 eventually go on to insulin?
- Is there any potential for a cure within the next few years, according to current research?
The first concerns a worst outcome, the second the inevitability of a treatment, and the third a best outcome. Numerous questions contemplated a decline in health, for example:
- Will it get worse [?]
- Will my condition only worsen [?]
Many asked about the likelihood and hoped-for outcome of specific treatments, for example:
- Can I ever reduce insulin & meds and feel good [?]
- Can diabetes be cured or rendered almost gone overtime through medicine and nutrition [?]
- Could a pancreas transplant cure diabetes in a person?
- What dictates the type of treatment needed/required for diabetes, and is directly injecting insulin ever avoidable?
Prognosis questions that were about a possible cure for diabetes were prominent. See S1 Table. All questions contained the word “cure,” or similar, such as “reverse,” “heal,” “[fully] go away,” “[completely] get over.” Many questions looked to scientific research for a cure and acknowledged it as a matter for the future. A smaller number specifically referred to diet or lifestyle changes, something an individual can do to affect the course of diabetes. Those asking such questions may be more receptive to taking action on their own behalf.
(2) Questions revealing the emotion behind an information need.
Consider the following two questions, both ostensibly seeking to understand why the questioner has diabetes.
- How on earth I ever got diabetes in the first place. Never over weight blood pressure always fine never eat sweet food [?]
- Why me?
The first question shows some understanding of risk factors for the condition without, it seems, fully understanding genetic risks. Puzzlement and frustration are expressed. The second question is less a plea for information and more an expression of frustration and defeat. Its meaning, and what counts as an adequate answer, will differ depending on when it is asked–at diagnosis, at a periodic review of the patient’s care, when a new complicating factor has arisen that will affect self-management, or when there is a transition in care, such as with age-related changes or a change in the care team .
(3) Questions disclosing as yet unrecognized information needs.
It is crucial that information on diabetes covers not only what health professionals consider important for people to know but also what the different constituent groups want to know, whether considered important to health professionals or not. Directly solicited, open-ended invitations to ask questions are a way to reveal information needs that may not be anticipated by health educators. Take the following questions that in effect ask for a severity index for diabetes.
- Are there variations in severity to diabetes and what determines severity?
- To what extent does it exist on a spectrum, such that people may be classified according to the degree to which they are diabetic, even if they are not diagnosable as diabetic according to present criteria?
- Are there variations in severity to diabetes and what determines severity?
This topic is covered in the literature  but not prominently or not at all in the trusted and vetted sources of patient-oriented diabetes information resources. It may be important to some patients’ needs to fully understand their condition. A related set of questions reveals a similar and important wish, whether feasible or not, to be able to monitor one’s health before it gets to a point of no return , as explicitly stated in this question:
- If you suspect you have Type 2 diabetes, at which point will it become impossible for you to reverse it by only changing your diet and exercise habits (and without requiring medication or the need to see a doctor?)
(4) Questions from specific constituent groups.
Diabetes information is important not only for people with diabetes but also friends and family of people diagnosed with diabetes and for caregivers, those family members, neighbors, friends or paid persons who regularly look after someone with diabetes. Our corpora included several such questions.
- What are some ways to help a family member accept a diagnosis of diabetes?
- How hard is it to treat when the person who needs help isn't very receptive to their condition?
- What are some things you can do to help a family member better manage an appropriate diet for type 2 diabetes?
- What is the best way I can help my friends and family members with controlling their diabetes?
For other chronic conditions such as mental-health disorders, for instance, the role of family and friends is broadly acknowledged and discussed in education and information resources. Question collection on a massive scale may suggest a more prominent place for this topic in diabetes education.
The topics associated with the clinic questions (Own Health Record Related, Disease Complications, Treatment, and Psychosocial) confirm what might be expected, namely that patients whose condition is actively being managed are most concerned about complications of the condition specific to their medical history, with a primary concern being about psychosocial matters related to their disease. T2DM is a complex condition that has different disease progressions for different people and for the same person over time and as life circumstances change . Significant effort has to go into making sense of the experience. A recent study comparing people seeking online health information for their own problem against those seeking information for someone else’s showed that the first group in contrast to the second focused primarily on symptoms and matters related to their own disease history .
The crowdsourced questions’ focus on Causes, Risk Factors. Prevention, Diagnosis, and Distribution of a Disease in a Population most likely reflects the fact that the crowdsourced questioners were in the main (91%) not themselves diagnosed but knew someone who was and so likely sought to understand what leads to diabetes and who among their family members may be at risk. Those seeking online information for someone else’s health problem have been shown to focus primarily on causes of a disease and disease terminology .
The stronger representation of Psychosocial questions from men warrants further investigation. Gender-based notions of masculinity have been shown for some people to be in conflict with effective self-management of T2DM, a central component in the treatment of diabetes . The stronger representation of questions on Weight from women is perhaps not unexpected, but with recent research showing that men are developing T2DM at lower levels of adiposity than women, this may change .
The clinic questions, not surprisingly, predominantly concerned post-diagnosis issues whereas the onset of diabetes dominated crowdsourced questions. The number of crowdsourced questions asking, in effect, how you know if you have diabetes accounts for the high number of questions categorized under Onset of T2DM. Such a concern is consistent with the fact that over 30 percent of those with diabetes in the United States are unaware they have the disease . It also perhaps indicates that the public health message about the prevalence of diabetes is being heard and people are wondering about their own health status.
There is a long and extensive record of questions being collected from health professionals and analyzed. Questions have been collected at the point of care, from email consultation with specialists, and through queries to information systems [26, 31, 33, 35–38]. Clinical questions have been categorized as to the kind of knowledge they sought and the kind of answers they needed, with taxonomic and other organizing structures proposed for them [24, 27, 35, 37]. The questions of family-medicine, elder-care, and rural-health physicians have been explored [25, 29, 30, 32, 34, 44]. Experiments have been done on different ways of capturing clinical questions through voice and other input media [28, 39–43]. Clinical questions associated with specific disorders have been evaluated, most notably cancer , and T2DM . A systematic review of three decades of studies on clinical information needs  found that roughly 30% of the question types accounted for 80% of the questions clinicians asked, where a question’s type was relative to a 64-item taxonomy .
Studies of questions from healthcare consumers are relatively recent. In , 276 health-related questions posted on a social media question-answer website were subjected to qualitative content analysis, focusing on meta-characteristics of the questions such as the users’ motivations for asking the questions. In  and  a manual topic-based analyses of consumer questions was done using topics from the UMLS. In , 365 questions from a mailing list were analyzed in terms of topics and the type of question. In  and  smaller question collections (72 and 12) were subjected to detailed semantic, attitudinal or linguistic analysis. An increasing number of studies concern the development of question–answering technology for consumer health questions [83–87]. Patients have different information needs about T2DM at different points as their disease progresses. However little is known about these needs and how they change over time or across varying health or life circumstances —even though there has been a significant amount of research on what the different phases of T2DM are [58–67]. It is in cancer care that the needs of patients at different stages of their disease have been most thoroughly studied [88–91]. These studies, show, for instance, that while most (91%) female breast-cancer patients wanted to know their prognosis before beginning adjuvant treatment , after the first consultation, their needs often shifted to matters of support, with 59–63% primarily wanting reassurance and hope and patients with advanced disease often desiring less information about their illness . It is important that we develop a similar understanding of the changing needs of people with T2DM.
The urgent need for resources allowing patients with T2DM to find answers to their questions has recently been documented . One longer-term goal of this study is to develop a question-answer system, informed by the analysis of a very large number of questions and vetted answers and based on the automated identification of topics in questions. The twenty-three categories we devised for this study will almost certainly need further refinement, with a hierarchy of topics or an ontology possibly providing a better representation. In addition, answer topics as well as question topics need to be defined. For example, suppose a patient’s question is “I’m 44 and recently diagnosed with Type 2 diabetes, and now I am having difficulty reading fine print. Is this related to my diabetes?” This question falls into four possible answer categories. The first relates to temporary changes in eyesight when blood glucose fluctuates. The second concerns a side effect of the drug pioglitazone. The third is about diabetic retinopathy that leads to blindness. And the fourth concerns normal age-related changes in eyesight.
Finer-grained characteristics that are important in the management of diabetes are also needed. For example, the capacity of a person to act in any given environment (known as agency) seems to be expressed differentially in our questions . The following question about a cure for diabetes appears to locate agency within the patient: “What stuff do you have to do to cure diabetes?” This is in contrast to a question that appears to locate agency within the broader society: “How close is science to finding a cure for diabetes?” If patients over time asked questions that differed in the location of agency, that would be of interest and possible clinical significance. In our follow-up studies when new questions are collected from patients, we will be labeling each question by the stage the questioner is in relative to his or her own disease progression. In this way a record of the questions asked in the aggregate by patients at each phase of the disease can be compiled along with the progression of questions for each patient individually, providing a broader and deeper perspective on the complex needs of those affected by or at risk of T2DM.
- 1. NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants. Lancet. 2016;387(10027):1513–1530. pmid:27061677
- 2. Zimmet P, Alberti K, Shaw J. Global and societal implications of the diabetes epidemic. Nature. 2001;414(6865):782–787. pmid:11742409
- 3. Danaei G, Finucane M, Lu Y, Singh G, Cowan M, Paciorek C. National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2.7 million participants. Lancet. 2011;378(9785):31–40. pmid:21705069
- 4. Gregg E, Li Y, Wang J, Burrows N, Ali M, Rolka D, et al. Changes in diabetes-related complications in the United States, 1990–2010. New England Journal of Medicine. 2014;370:1514–1523. pmid:24738668
- 5. Whitford D, Paul G, Smith S. Patient generated “frequently asked questions”: Identifying informational needs in a RCT of peer support in type 2 diabetes. Primary Care Diabetes. 2013;7(2):103–109. pmid:23428963
- 6. Browne J, Scibilia R, Speight J. The needs, concerns, and characteristics of younger Australian adults with Type 2 diabetes. Diabetic Medicine. 2013;30(5):620–626. pmid:23181664
- 7. Weymann N, Harter M, Dirmaier J. Information and decision support needs in patients with type 2 diabetes. Health Informatics Journal. 2014;22(1):46–59. pmid:24916569
- 8. Barnes P, Kelly C, Connelly K, Siek K. Understanding the needs of low SES patients with type 2 diabetes. 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops; Venice, Italy2013.
- 9. Rubin R, Peyrot M, Siminerio L. Health care and patient-reported outcomes: results of the cross-national Diabetes Attitudes, Wishes and Needs (DAWN) study. Diabetes Care. 2006;29(6):1249–1255. pmid:16732004
- 10. Bootle S, Skovlund S. Proceedings of the 5th International DAWN Summit 2014: Acting together to make person-centred diabetes care a reality. Diabetes Research and Clinical Practice. 2015;109(1):6–18. pmid:25979275
- 11. Atkinson S, Rubinelli S. Narrative in cancer research and policy: voice, knowledge and context. Critical Reviews in Oncology/Hematology. 2012;84(2):S11–S16. pmid:23347413
- 12. Longo D, Schubert S, Wright B, LeMaster J, Williams C, Clore J. Health Information Seeking, Receipt, and Use in Diabetes Self-Management. The Annals of Family Medicine. 2010;8(4):334–340. pmid:20644188
- 13. Menke A, Casagrande S, Geiss L, Cowie C. Prevalence of and Trends in Diabetes Among Adults in the United States, 1988–2012. JAMA 2015;314(10):1021. pmid:26348752
- 14. Questions To Ask Your Doctor. Content last reviewed May 2018. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/patients-consumers/patient-involvement/ask-your-doctor/index.html. Accessed August 16, 2018
- 15. Powers MA, Bardsley J, Cypress M, et al.: Diabetes self-management education and support in type 2 diabetes: a joint position statement of the American Diabetes Association, the American Association of Diabetes Educators, and the Academy of Nutrition and Dietetics. Diabetes Care 2015; 38: 1372–1382. pmid:26048904
- 16. Crangle C, Brothers Kart J. A questions-based investigation of consumer mental-health information. PeerJ 2015;3:e867. pmid:25870768
- 17. Sleath B, Sayner R, Blalock SJ, Carpenter DM, Muir KW, Hartnett ME, et al. Patient question-asking about glaucoma and glaucoma medications during videotaped medical visits. Health Commun. 2015;30(7):660–8. Epub 2014 Jul 25. 10.1080/10410236.2014.888387 pmid:25061778
- 18. Stuckey H, Olmsted T, Mincemoyer R, Gabbay R. What Do People with Diabetes Talk about on a Diabetes Social Networking Web Site?. Journal of Diabetes Science and Technology. 2012;6(3):716–717. pmid:22768905
- 19. Hilliard M, Sparling K, Hitchcock J, Oser T, Hood K. The Emerging Diabetes Online Community. Current Diabetes Reviews. 2015;11(4):261–272. pmid:25901500
- 20. Mann D. Diabetes Risk Communication Tool To Improve Lifestyle Behaviors. 2013. http://grantome.com/grant/NIH/K23-DK081665-03. Accessed August 16, 2018
- 21. Kessels R. Patients’ memory for medical information. Journal of the Royal Society of Medicine. 2003;96(5):219–222. pmid:12724430
- 22. Selic P, Svab I, Repolusk M, Gucek N. What factors affect patients' recall of general practitioners' advice? BMC Family Practice. 2011;12:141. pmid:22204743
- 23. Patocka C, Lin M, Voros J, Chan T. Point-of-care Resource Use in the Emergency Department: A Developmental Model. AEM Educ Train. 2018 May 30;2(3):221–228. eCollection 2018 Jul. pmid:30051092
- 24. Ely J, Osheroff J, Gorman P, Ebell M, Chambliss M, Pifer E, et al. A taxonomy of generic clinical questions: classification study. BMJ. 2000;321(7258):429–432. pmid:10938054
- 25. Ely J, Osheroff J, Ebell M, Bergus G, Levy B, Chambliss M, et al. Analysis of questions asked by family physicians regarding consumer care. Western Journal of Medicine. 2000;172(5):315–319. pmid:18751285
- 26. Jerome R, Giuse N, Gish K, Sathe N, Dietrich M. Information needs of clinical teams: analysis of questions received by the Clinical Informatics Consult Service. Bulletin of the Medical Library Association. 2001;89(2):177–184. pmid:11337949
- 27. Huang X, Lin J, Demner-Fushman D. Evaluation of PICO as a knowledge representation for clinical questions. AMIA Annual Symposium Proceedings. 2006:359–363. pmid:17238363
- 28. Chase H, Kaufman D, Johnson S, Mendonca E. Voice capture of medical residents' clinical information needs during an inpatient rotation. Journal of the American Medical Informatics Association 2009;16(3):387–394. pmid:19261939
- 29. Hung P, Johnson S, Kaufman D, Mendonça E. A multi-level model of information seeking in the clinical domain. Journal of Biomedical Informatics 2008;41(2):357–370. pmid:18006383
- 30. Del Fiol G, Workman T, Gorman P. Clinical Questions Raised by Clinicians at the Point of Care: A Systematic Review. American Medical Association's Journal of Internal Medicine. 2014;174(5):710–718. pmid:24663331
- 31. Hübner-Bloder G, Duftschmid G, Kohler M, Rinner C, Saboor S, Ammenwerth E. Clinical situations and information needs of physicians during treatment of diabetes mellitus patients: a triangulation study. Studies in Health Technology and Informatics. 2011;169:369–373. pmid:21893775
- 32. Dorsch J. Information needs of rural health professionals: a review of the literature. Bulletin of the Medical Library Association 2000;88(4):346–354. pmid:11055302
- 33. Hersh W, Crabtree M, Hickam D, Sacherek L, Rose L, Friedman C. Factors associated with successful answering of clinical questions using an information retrieval system. Bulletin of the Medical Library Association. 2000;88(4):323–331. pmid:11055299
- 34. Green M, Ciampi M, Ellis P. Residents' medical information needs in clinic: are they being met? American Journal of Medicine 2000;109(3):218–223. pmid:10974185
- 35. Bergus G, Randall C, Sinift S, Rosenthal D. Does the structure of clinical questions affect the outcome of curbside consultations with specialty colleagues? Archives of Family Medicine 2000;9(6):541–547. pmid:10862217
- 36. Florance V. Medical knowledge for clinical problem solving: a structural analysis of clinical questions. Bulletin of the Medical Library Association. 1992;80(2):140–149. pmid:1600423
- 37. Osheroff J, Forsythe D, Buchanan B, Bankowitz R, Blumenfeld B, Miller R. Physicians' information needs: analysis of questions posed during clinical teaching. Annals of Internal Medicine 1991;114(7):576–581. pmid:2001091
- 38. Cimino C, Barnett G. Analysis of physician questions in an ambulatory care setting. Proceedings of the Annual Symposium on Computer Application in Medical Care. 1991:995–999.
- 39. Crangle C, Carlson R, Fagan L, Davis A, Erlbaum M, Keck K, et al. Conversational access to on-line cancer information: An adaptable speech interface. Proceedings of the AMIA (American Medical Informatics Association) Fall Symposium: Beyond the Superhighway: Exploiting the Internet with Medical Informatics; 1996 October 26–30; Washington D.C., Philadelphia, USA.
- 40. Crangle C, Carlson R, Fagan L, Erlbaum M, Sherertz D, Tuttle M. Conversational access to on-line medical information. 15th Annual Conference of the American Voice Input/Output Society (AVIOS); 1996 September 10–12; San Jose, California, USA.
- 41. Crangle C, Carlson R, Fagan L, Erlbaum M, Olson N, Sherertz D, et al. Meeting clinical information needs: An interface that provides uniform access across diverse information sources. In: Press A, editor. AAAI-96 Spring Symposium on Artificial Intelligence in Medicine; March 25–27; Stanford University, California, USA1996. p. 21–25.
- 42. Sherertz D, Tuttle M, Olson N, Hsu G, Carlson R, Fagan L, et al. Accessing oncology information at the point of care: experience using speech, pen, and 3-D interfaces with a knowledge server. Medinfo. 1995;8(1):792–795.
- 43. Tuttle M, Sherertz D, Olson N, Nelson S, Erlbaum M, Keck K. Toward reusable software components at the point of care. Proceedings of the AMIA Fall Symposium; 1996.
- 44. Del Fiol G, Weber A, Brunker C, Weir C. Clinical questions raised by providers in the care of older adults: a prospective observational study. BMJ Open. 2014;4(7):e005315. pmid:24996915
- 45. Zhang Y. Contextualizing Consumer Health Information Searching: an Analysis of Questions in a Social Q&A Community. ACM International Health Informatics Symposium; 2010.
- 46. Roberts K, Masterton K, Fiszman M, Kilicoglu H, Demner-Fushman D. Annotating Question Types for Consumer Health Questions. LREC 2014.
- 47. Cui L, Tao S, Zhang G- Q. A Semantic-based Approach for Exploring Consumer Health Questions Using UMLS. AMIA Annual Symposium Proceedings. 2014:432–441. pmid:25954347
- 48. Roberts K, Demner-Fushman D. Interactive use of online health resources: a comparison of consumer and professional questions. Journal of the American Medical Informatics Association 2016;23(4):802–811. pmid:27147494
- 49. White M. Questioning Behavior on a Consumer Health Electronic List. The Library Quarterly. 2000;70(3):302–334.
- 50. Oh J, He D, Jeng W. Linguistic characteristics of eating disorder questions on Yahoo! Answers: content, style, and emotion. Proceedings of the American Society for Information Science and Technology. 2013;50:1–10.
- 51. Slaughter L, Soergel D, Rindflesch T. Semantic representation of consumer questions and physician answers. International Journal of Medical Informatics. 2006;75:513–529. pmid:16125448
- 52. Crangle CE, Bradley C, Carlin PF, Esterhay RJ, Harper R, Kearney PM, et al. (2017) Soliciting and Responding to Patients’ Questions about Diabetes through Online Sources. Diabetes Technology & Therapeutics, Volume 19, Number 3, 2017. pmid:28221815
- 53. Swan M. Crowdsourced Health Research Studies: An Important Emerging Complement to Clinical Trials in the Public Health Research Ecosystem. Journal of Medical Internet Research. 2012;14(2):e46. pmid:22397809
- 54. Litman L, Robinson J, Abberbock T. TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods. 2016. pmid:27071389
- 55. Peyrot M, Burns K, Davies M, Forbes A, Hermanns N, Holt R, et al. Diabetes Attitudes Wishes and Needs 2 (DAWN2): a multinational, multi-stakeholder study of psychosocial issues in diabetes and person-centred diabetes care. Diabetes Research and Clinical Practice. 2013;99(2):174–184. pmid:23273515
- 56. Skovlund S, Peyrot M. The diabetes attitudes, wishes and needs (DAWN) program: A new approach to improving outcomes of diabetes care. Diabetes Spectrum 2005;18(3):136–142.
- 57. Nicolucci A, Kovacs Burns K, Holt R, Comaschi M, Hermanns N, Ishii H, et al. Diabetes Attitudes, Wishes and Needs second study (DAWN2): cross-national benchmarking of diabetes-related psychosocial outcomes for people with diabetes. Diabetic Medicine. 2013;30:767–777. pmid:23711019
- 58. Weinger K, Welch G, Jacobson A. Psychological and psychiatric issues in diabetes mellitus. In: Poretsky L, editor. Principles of Diabetes Mellitus. Norwell, Massachusetts: Kluwer Academic Publishers; 2002. p. 639–654.
- 59. Welch G, Jacobson A, Polonsky W. The Problem Areas in Diabetes Scale: An evaluation of its clinical utility. Diabetes Care 1997;20:760–766. pmid:9135939
- 60. Gonzalez J, Safren S, Delahanty L, Cagliero E, Wexler D, Meigs J, et al. Symptoms of depression prospectively predict poorer self-care in patients with Type 2 diabetes. Diabetic Medicine. 2008;25:1102–1107. pmid:19183315
- 61. Walker R, Smalls B, Hernandez-Tejada M, Campbell J, Davis K, Egede L. Effect of diabetes fatalism on medication adherence and self-care behaviors in adults with diabetes. General Hospital Psychiatry. 2012;34:598–603. pmid:22898447
- 62. Yi J, Vitaliano P, Smith R, Yi J, Weinger K. The role of resilience on psychological adjustment and physical health in patients with diabetes. British Journal of Health Psychology 2008;13:311–325. pmid:17535497
- 63. Yi J, Yi J, Vitaliano P, Weinger K. How does anger coping style affect glycemic control in diabetes patients?. International Journal of Behavioral Medicine 2008;15:167–172. pmid:18696309
- 64. Yi-Frazier J, Smith R, Vitaliano P, Yi J, Mai S, Hillman M, et al. A Person-Focused analysis of resilience resources and coping in diabetes patients. Stress Health. 2010;26:51–60. pmid:20526415
- 65. Rucker J, McDowd J, Kluding P. Executive function and type 2 diabetes: putting the pieces together. Physical Therapy 2012;92:454–462. pmid:22135708
- 66. Schiotz M, Bogelund M, Almdal T, Jensen B, Willaing I. Social support and self-management behaviour among patients with Type 2 diabetes. Diabetic Medicine 2012;29(654–661). pmid:21992512
- 67. Diapedia. Coping with diabetes in adults. Diapedia 2016. https://doi.org/10.14496/dia.61047161185.10
- 68. Warrens MJ (2015) Five Ways to Look at Cohen’s Kappa. J Psychol Psychother 5:197.
- 69. Dormann C. F., Elith J., Bacher S., Buchmann C., Carl G., Carré G., et al. 2013, Collinearity: a review of methods to deal with it and a simulation study evaluating their performance.Ecography, 36: 27–46.
- 70. Tibshirani R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society (Series B), 58, 267–288.
- 71. Efron Bradley; Hastie Trevor; Johnstone Iain; Tibshirani Robert. Least angle regression. Ann. Statist. 32 (2004), no. 2, 407–499. https://projecteuclid.org/euclid.aos/108317893562.
- 72. Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825–2830, 2011.
- 73. Benjamini Y,Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 57 (1): 289–300.
- 74. Stevens JR, Al Masud A, Suyundikov A. (2017) A comparison of multiple testing adjustment methods with block-correlation positively-dependent tests. Zaykin D, ed. PLoS ONE. 2017;12(4):e017
- 75. Benjamini Yoav; Yekutieli Daniel. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 (2001), no. 4, 1165–1188. https://projecteuclid.org/euclid.aos/1013699998
- 76. Rochon J, Gondan M, Kieser M. To test or not to test: Preliminary assessment of normality when comparing two independent samples. BMC Medical Research Methodology. 2012;12:81. pmid:22712852
- 77. Zare-Farashbandi F, Lalazaryan A, Rahimi A, Zadeh A. How health information is received by diabetic patients? Advanced Biomedical Research 2015;4:126. pmid:26261828
- 78. William PG, Renda A, Dong Y, Diabetes Complications Severity Index (DCSI)—Update and ICD-10 translation, Journal of Diabetes and its Complications, Volume 31, Issue 6, 2017, Pages 1007–1013, ISSN 1056-8727, https://doi.org/10.1016/j.jdiacomp.2017.02.018. pmid:28416120
- 79. Peyrot M, McMurry JJ, Kruger D. A biopsychosocial model of glycemic control in diabetes: stress, coping and regimen adherence. Journal of Health and Social Behavior. 1999;40:141–158. pmid:10467761
- 80. Pian W, Khoo C, Chang Y- K, Moorhead A. The Criteria People Use in Relevance Decisions on Health Information: An Analysis of User Eye Movements When Browsing a Health Discussion Forum. Journal of Medical Internet Research. 2016;18(6):e136. pmid:27323893
- 81. Liburd L, Namageyo-Funa A, Jack LJ. Understanding "masculinity" and the challenges of managing type-2 diabetes among African-American men. Journal of the National Medical
- 82. Peters S, Huxley R, Sattar N, Woodward M. Sex Differences in the Excess Risk of Cardiovascular Diseases Associated with Type 2 Diabetes: Potential Explanations and Clinical Implications. Current Cardiovascular Risk Reports 2015;9(7):36. pmid:26029318
- 83. Bickmore T, Utami D, Matsuyama R, Paasche-Orlow M. Improving Access to Online Health Information With Conversational Agents: A Randomized Controlled Experiment. Journal of Medical Internet Research. 2016;18(1):e1. pmid:26728964
- 84. Bates M. Subject access in online catalogs: a design model. Journal of the American Society for Information Science. 1986;37:357–376.
- 85. Anderson J, Perez-Carballo J. The nature of indexing: how humans and machines analyze messages and texts for retrieval, Part I: Research, and the nature of human indexing. Information Processing and Management 2001;37:231–254.
- 86. Van Der Volgen J, Harris B, Demner-Fushman D. Analysis of consumer health questions for development of question–answering technology. In: Mitchell N, editor. Proceedings of one HEALTH: Information in an interdependent world, the 2013 annual meeting and exhibition of the Medical Library Association (MLA). Chicago 2013.
- 87. Cronin R, Fabbri D, Denny J, Jackson G. Automated Classification of Consumer Health Information Needs in Patient Portal Messages. AMIA Annual Symposium Proceedings. 2015:1861–1870. pmid:26958285
- 88. Butow P, Maclean M, Dunn S, Tattersall M, Boyer M. The dynamics of change: cancer patients' preferences for information, involvement and support. Annals of Oncology. 1997;8(9):857–863. pmid:9358935
- 89. Mills M, Sullivan K. The importance of information giving for patients newly diagnosed with cancer: a review of the literature. Journal of Clinical Nursing. 1999;8(6):631–642. pmid:10827609
- 90. Siminoff L, Ravdin P, Colabianchi N, Sturm C. Doctor-patient communication patterns in breast cancer adjuvant therapy discussions. Health Expect. 2000;3(1):26–36. pmid:11281909
- 91. Lobb E, Butow P, Meiser B, Barratt A, Gaff C, Young M, et al. Tailoring communication in consultations with women from high risk breast cancer families. British Journal of Cancer. 2002;87(5):502–508. pmid:12189544
- 92. Armstrong David. Actors, patients and agency: a recent history. Sociology of Health & Illness Vol. 36 No. 2 2014 ISSN 0141-9889, pp. 163–174 pmid:24372176