Development and Validation of a New Questionnaire Assessing Quality of Life in Adults with Hypopituitarism: Adult Hypopituitarism Questionnaire (AHQ)

Objective To develop and validate the Adult Hypopituitarism Questionnaire (AHQ) as a disease-specific, self-administered questionnaire for evaluation of quality of life (QOL) in adult patients with hypopituitarism. Methods We developed and validated this new questionnaire, using a standardized procedure which included item development, pilot-testing and psychometric validation. Of the patients who participated in psychometric validation, those whose clinical conditions were judged to be stable were asked to answer the survey questionnaire twice, in order to assess test-retest reliability. Results Content validity of the initial questionnaire was evaluated via two pilot tests. After these tests, we made minor revisions and finalized the initial version of the questionnaire. The questionnaire was constructed with two domains, one psycho-social and the other physical. For psychometric assessment, analyses were performed on the responses of 192 adult patients with various types of hypopituitarism. The intraclass correlations of the respective domains were 0.91 and 0.95, and the Cronbach’s alpha coefficients were 0.96 and 0.95, indicating adequate test-retest reliability and internal consistency for each domain. For known-group validity, patients with hypopituitarism due to hypothalamic disorder showed significantly lower scores in 11 out of 13 sub-domains compared to those who had hypopituitarism due to pituitary disorder. Regarding construct validity, the domain structure was found to be almost the same as that initially hypothesized. Exploratory factor analysis (n = 228) demonstrated that each domain consisted of six and seven sub-domains. Conclusion The AHQ showed good reliability and validity for evaluating QOL in adult patients with hypopituitarism.


Introduction
Hypopituitarism is generally a chronic and life-long disease involving deficiencies in one or more of the six pituitary hormones: growth hormone (GH), thyroid stimulating hormone (TSH), adrenocorticotrophic hormone (ACTH), luteinizing hormone (LH), follicle stimulating hormone (FSH) and prolactin (PRL). It is also occasionally associated with diabetes insipidus caused by deficiency of antidiuretic hormone (ADH) [1]. Hypopituitarism and diabetes insipidus can result from various conditions affecting the pituitary and/or the hypothalamus, including tumors, infiltrative lesions, infarction, apoplexy, trauma, and infection. It can also be caused by breech delivery with asphyxia, surgery, and radiation therapy. Hypopituitarism and diabetes insipidus in adults may result in various symptoms, such as fatigue, loss of energy, decreased muscle strength, decreased sociability, emotional instability, sexual dysfunction, hypotension, polyuria, polydipsia, and disturbance of consciousness and cognitive function. These symptoms cause deterioration in the quality of life (QOL) of those who suffer from these conditions. Although these symptoms may be partially ameliorated by standard treatment, including hormonal replacement therapy, many patients complain of residual symptoms. Therefore, patients' subjectively-perceived conditions should be elicited and monitored at regular intervals in order to evaluate whether treatment is successful or not.
Conditions to be addressed include patients' physical, emotional, and social functioning, and satisfaction in daily life.
Generic or disease-specific QOL scales can be used to measure QOL in patients with hypopituitarism. Generic scales include the Nottingham Health Profile [2], the Psychological General Well-Being index [3], and the MOS Short Form 36-item Health Survey (SF-36) [4][5][6]. Some subjective patient-reported outcome measures, such as the Quality of Life -Assessment of Growth Hormone Deficiency in Adults (QoL-AGHDA) [7,8], and the Questions on Life Satisfaction (QLS) [9], are available to evaluate impacts of hypopituitarism. We have previously evaluated both SF-36 and Qol-AGHDA in Japanese patients with growth hormone deficiency [10]. However, using these measures, we could not detect changes in QOL between patients treated with either growth hormone or a placebo. Therefore, these QOL measures may not fully evaluate the entire range of quality of life issues which patients with hypopituitarism may experience, because they emphasize symptoms induced by a particular hormone deficiency, or particular aspects of a condition, such as influence on mental state.
We therefore organized a research group to develop a new, disease-specific, psychometrically valid, patient-reported outcome measure, named the Adult Hypopituitarism Questionnaire (AHQ), in order to multilaterally and exhaustively evaluate the daily and social lives, as well as the physical and mental functioning, of patients with hypopituitarism and diabetes insipidus.

Methods
The development of the AHQ was conducted in a standardized manner, using an accepted measure development methodology which included item development, pilot testing, and psychometric validation [11]. The study was approved by the Ethics Committee of each facility, and all participants gave written, informed consent prior to interviews or survey participation. Personally identifiable information, such as names, phone numbers, and addresses, was not collected from participants in order to fully protect their privacy.

Item Development and Cognitive Debriefing
Questionnaire items were generated through a multi-step process: 1) review of relevant measures and related papers; 2) patient interviews; 3) examination by the research group; and 4) cognitive debriefing of a small number of patients [12].
A pool of 70 items, which consisted of candidate items that reflected the construct concept of the AHQ, was generated, primarily through patient interviews by experts and review of relevant literatures. Twelve patients with hypopituitarism were recruited and interviewed by the authors, who have experience in questionnaire development. The main purpose of these patient interviews was exploration of patient-perceived psychological and physical impacts of the disease and its treatment. This included impacts on social, psychological, and physical functioning, as well as anxiety about the future. Data obtained from these patient interviews were sorted and qualitatively analyzed to examine whether each item reflected the construct concept. The research group then selected items based on the construct concept, while taking precautions not to omit necessary concepts. A list of 81 draft items was produced. Cognitive debriefing of the preliminary AHQ was conducted with a small number of patients to assess patients' interpretations of the questions (26 patients in the first pilot test and 13 patients in the second pilot test). These patients were asked to complete the preliminary AHQ, and were then interviewed about its comprehensiveness, relevance, and clarity of expression. Due to the limited number of patients who could participate in the pilot tests, we did not prevent patients from participating in both pilot tests.

Psychometric Validation
A patient survey was conducted to collect answers to each question for psychometric validation. The reliability and validity of the AHQ were then psychometrically tested using the collected questionnaires.
1. Participants and survey procedure. The participants in the survey were both male and female Japanese patients over 18 years old with (pan)hypopituitarism, diabetes insipidus, hypogonadotropic hypogonadism, or isolated hormone deficiency, who could understand the questionnaire and fill in their answers without assistance.
Overall, 203 participants were recruited at seven medical facilities, and were asked to complete and return the survey questionnaire to an independent research office (CLINICAL STUSY SUPPORT, Inc., Nagoya, Japan). Of these, 108 participants whose clinical conditions were stable were asked to answer the questionnaire a second time, after an interval of 1 day, to assess the test-retest reliability of the AHQ. Ten participants who had not previously received replacement therapy and who were to start therapy were asked to answer the questionnaire twice; before replacement therapy, and again 3 months or later after beginning therapy. This group is referred to as the ''before-andafter-therapy'' group.
In addition to the AHQ, the survey questionnaire included the SF-36v2 (Medical Outcomes Study Short Form-36 ver. 2, Japanese edition), which is a comprehensive index of healthrelated QOL, in order to concurrently assess the validity of the AHQ. The SF-36v2 consists of 8 domains [13]: physical functioning, role physical, bodily pain, social functioning, general health perceptions, vitality, mental health, and role emotional. The scores are expressed in two summary scores, a physical component summary score and a mental component summary score. Scores of physical functioning, role physical, and bodily pain contribute to the physical component summary; scores of role emotional, social functioning, and mental health contribute to the mental component summary; and scores of social functioning, vitality, and general health perceptions contribute to both.

2.
Statistical methods for psychometric testing. Demographic and clinical variables of the participants were summarized using descriptive analyses. In the item analysis, any item which met the following conditions was deleted: 1) any item whose floor effect or ceiling effect was 80% or higher; 2) one of any two items whose correlation coefficient was 0.8 or higher; 3) if the correlation coefficient between each item and the total score, excluding an item, was very low compared to that of other items. For reliability, internal consistency and reproducibility were examined. With regard to internal consistency, the homogeneity of the question items in each domain was evaluated using Cronbach's a coefficient. A coefficient of 0.7 or higher is preferred for a questionnaire to be internally consistent [14]. For reproducibility, the two sets of answers from the patients in the test-retest group whose clinical conditions were stable were examined using the intraclass correlation coefficient. A coefficient of 0.7 or higher was considered evidence of acceptable test-retest reliability [12].
With regard to validity, construct validity (domain structure), concurrent validity, and known-group validity were examined. For construct validity, factor analysis was performed using the principal factor method with a promax rotation to test the hypothesized domain structure. Exploratory factor analysis was also performed to examine subdomain structure, although the AHQ was developed assuming two domains. Three genderspecific items (''beards'', ''erectile dysfunction'', and ''menses'') were excluded from factor analysis. Concurrent validity was evaluated using Pearson's product-moment correlation coefficient with SF-36v2. We hypothesized that the SF-36v2 sub-domains belonging to the mental component summary would correlate more strongly with the AHQ sub-domains belonging to the psycho-social domain than with those belonging to the physical domain. Likewise, we anticipated that the SF-36v2 sub-domains belonging to the physical component summary would correlate more strongly with the AHQ sub-domains belonging to the physical domain. According to the criterion of correlation strength in the psychometric validation proposed by Cohen [15], the correlation coefficient was judged as follows: 0.1 = weak correlation; 0.3 = medium correlation; and 0.5 = strong correlation. For known-group validity, relationships between selected clinical variables and the domain score were examined using a t-test. For responsiveness, a change in the mean score before and after replacement therapy in the before-and-after-therapy group was examined using a paired t-test.
AHQ item scores were transformed into a scale of 0 to 100, with higher scores indicating better patient condition. The sub-domain score was determined to be the mean score of attributive question items. When a missing response was found to a question item attributed to a sub-domain, the following procedures were employed: 1) when the number of items with a missing response in a sub-domain was less than 50% of the total number of items in the sub-domain, the mean was calculated by imputing the missing responses based on the mean of the non-missing items; 2) when the number of items with a missing response in a sub-domain was more than 50% of the total number of items in the sub-domain, the sub-domain score was not calculated. If a sub-domain score was not available, the domain score was not calculated. All statistical tests were two-tailed, and the level of significance was set at 5%.

Item Development and Cognitive Debriefing
A total of 12 participants with panhypopituitarism and/or diabetes insipidus were interviewed in September 2004 and February 2005 to determine the question items, and seventy items were pooled as potential questions. After review by the research group, some questions were added, and 81 items covering four aspects (social functioning, mental functioning, physical functioning and condition, and anxiety about the future) were finally generated. A 7-point (0 to 6) Likert scale was employed as the response option under the assumption of an equally spaced distance between response choices.
A pilot test for cognitive debriefing was performed with 26 participants in October 2005 to examine content validity of the preliminary questionnaire in regards to factors such as relevance and clarity of language. The mean age of the patients was 47.4 years (range: 22-80), and 12 of the participants were male (46.2%). The mean time required to answer all the questions was 11.7 minutes (range: 5-35). The question items were considered to be easily understandable because when surveyed, the participants did not make any particular comments indicating difficulty answering the questionnaire as a whole, although we decided to make minor changes to some questions in response to patients' suggestions. The second pilot test was conducted with 13 participants from November to December 2005, because the answer scale was partially modified for the questions asking about the participants' conditions. More than 80% of the participants could easily answer the questions, indicating that modification of the answer scale did not have a negative impact on the ability of the participants to answer the questions. Finally, an 81-item provisional questionnaire was determined. Based upon theoretical considerations, a twodomain structure, with psycho-social and physical domains, was adopted.

Psychometric Validation
From February 2006 to October 2008, answered questionnaires from 203 participants were collected, and 196 of the fully answered questionnaires were subjected to item analysis. Of these, 192 questionnaires accompanied by the background characteristics of the participants were subjected to reliability and validity testing, except for exploratory factor analysis. Exploratory factor analysis was performed on 228 questionnaires received by October 2008, in order to examine sub-domain structure.  Table 1].
2. Item analysis. As a result of the item analysis of 196 answered questionnaires, floor and ceiling responses were not   observed in the distribution of answers. With regard to correlations between items, three item pairs that exhibited correlation coefficients of 0.8 or higher were found. Since these items apparently dealt with similar concepts, the item which was considered to be most understandable was retained. There was only one item (''frequent perspiration'') for which the correlation coefficient between the score of this item and the total score of items except this item was notably lower than the correlation coefficient between the score of the other individual items and the total score of items except that individual item (its correlation coefficient was 0.16). The content of the question regarding ''frequent perspiration'', from which this item was derived, was not considered to be especially important to patients with hypopituitarism, and hence it was deleted. On the basis of the results obtained from item analysis, 4 items were deleted and 77 items were retained in the questionnaire.
3. Factor analysis. The AHQ was developed assuming two domains; psycho-social and physical. Factor analysis was performed using the principal factor method and a promax rotation to examine the domain structure, and almost all the items were classified into the assumed domains. Of the items that had been assumed to be classified into the physical domain, 4 items (items 36, 44, 45, and 58) showed approximately the same factor loading values in both the domains, and 1 item (item 59) was strongly regressed to the psycho-social domain. However, based on their conceptual interpretability, all the items were incorporated into the physical domain.
4. Reliability. Cronbach's a coefficient ranged from 0.72 to 0.93 for the psycho-social sub-domains, and from 0.73 to 0.89 for the physical sub-domains, indicating acceptable internal consistency. Regarding reproducibility, the intraclass correlation coefficient ranged from 0.77 to 0.90 for the psycho-social sub-domains, and from 0.86 to 0.94 for the physical sub-domains. Their reproducibility was considered sufficient. [ Table 3].
5. Concurrent validity. The correlation coefficients between the AHQ sub-domains and the eight domains in the SF-36v2 were calculated in order to examine concurrent validity. The AHQ subdomains correlated moderately with all of the SF-36v2 subdomains, ranging from 0.13 to 0.68. As hypothesized, the physical component summary of the SF-36v2 correlated more strongly with the physical domain of the AHQ, and the mental component summary correlated more strongly with the psycho-social domain of the AHQ, although correlation coefficients were not compared statistically. [ Table 4].
6. Known-group validity. The relationship between clinical variables that may affect scores was examined. AHQ sub-domain scores clearly differentiated between the presence and absence of hypothalamic disorder with significant differences noted in 11 subdomains. The patients with ADH deficiency showed significantly lower scores in ''interpersonal relationships'', ''control of body temperature'', ''urination'', and ''body weight''. For GH deficiency, statistical significance was shown only in ''immunity, digestive tract, and musculoskeletal system'' and ''urination''. [ Figure 1].
7. Responsiveness. Changes in the mean score before and after hormone replacement therapy were examined using ten sets of questionnaires for which background information of the participants could be collected. The scores in all the sub-domains were higher after therapy, but a statistically significant difference was observed only in ''depressed mood'', ''vigor'', and ''physical strength'' (p's = 0.03, 0.01, and 0.04, respectively; paired t-test).

Discussion
We developed and validated the Adult Hypopituitarism Questionnaire (AHQ) as a new, disease-specific, self-administered measure for evaluating QOL in adult patients with hypopituitarism.
On the basis of psychometric testing, the AHQ was judged to be reliable and valid as a questionnaire for patients with hypopituitarism. Regarding reliability, good to excellent internal consistency and reproducibility were observed in all the sub-domains. Regarding validity, concurrent validity was suggested since the SF-36v2 sub-domains correlated more strongly with the related AHQ domains, as hypothesized, although a statistical comparison was not conducted. For known-group validity, relevance was exhibited between the sub-domain score and the clinical variables that might affect the scores, such as the presence or absence of hypothalamic disorder or ADH deficiency, indicating that the disordered or deficient group showed lower scores. With regard to GH deficiency, significant differences were observed in only two physical sub-domains. However, it is important to note that these results about known-group validity cannot be interpreted with confidence, because they were not obtained through a randomized placebo-controlled trial. It also remains controversial whether amelioration of GH deficiency-induced impairments translates into clinically meaningful improvement in physical function and QOL [16]. Further clinical studies using randomized, placebo-controlled designs and involving substantial numbers of patients would be required to demonstrate the benefits of this treatment on improving QOL. Regarding factor analysis, the items ''lack of physical strength'', ''big waist size'', ''easy to get fat'', ''poor physical condition'', and ''cannot speak loudly'' were not clearly classified into the assumed physical domain. Despite the unexpected loading of these items, the AHQ still showed sufficient internal consistency; therefore, they could be incorporated into the physical domain based on their conceptual interpretability. Although the 77 questions included in the AHQ seem like a large number, and might result in long completion time, the AHQ is able to detect various impacts of the disease and its treatments, in multiple aspects. Since most of the currently available measures emphasize particular aspects such as mental status, the AHQ can assess impacts that are not detected by existing measures. It is also important when treating chronic diseases such as hypopituitarism that the patient's condition, including QOL, be monitored longitudinally. The AHQ might help clinicians understand the severity of the disease and detect changes caused by treatment, as well as allow them to compare efficacy between treatments.
The AHQ was developed in Japanese, for Japanese patients with hypopituitarism. However, since the AHQ does not contain items that are specifically related to Japanese culture, it could be translated and used internationally. We note that the English version of the AHQ shown in this paper has not been linguistically validated. To develop a translated edition, the content must be translated in a linguistically appropriate manner. Moreover, the psychometric properties of the translated edition need to be assessed. Ideally, the translated edition should have the same domain structure as that of the original edition [17][18][19], enabling researchers to compare internationally obtained data.
Several limitations of this study should be noted. First, the survey was conducted at only eight hospitals, raising the issue of generalizability of the findings. To minimize this concern, broad eligibility criteria (over 18 years of age, with panhypopituitarism, hypogonadotropic hypogonadism, or isolated hormone deficiency) were employed. However, we note that there are a relatively small number of pituitary adenoma patients and a high number of germinoma and Sheehan's syndrome patients in this study. Second, it should also be noted that further assessment is necessary with respect to responsiveness to treatment. In this study, only 10 participants were examined for responsiveness, and there were no randomly assigned control groups. A larger number of participants need to be assessed in order to draw a firm conclusion, although a questionnaire with good reliability and validity is most likely to detect any clinically meaningful changes, in general [20]. Another concern is the one-day interval of the test-retest survey. When a short test-retest interval is employed, it is possible that patients may remember their responses and respond based on recall. In this study, however, the large number of questions in the AHQ and the use of a 7-point Likert scale may have prevented participants from responding based on recall to some extent.
This study did not investigate the relative performance of the AHQ compared with other available measures; therefore, it cannot be concluded which measure is most appropriate in a given research or clinical setting.
Based on the findings of this study, the AHQ is a potentially useful tool for estimating hypopituitarism-related symptoms, for monitoring QOL as a part of clinical management, and for the evaluation of treatment outcomes.