Figures
Abstract
Many existing measures of psychological well- and ill-being are used with young people without testing contemporary adolescent understanding of them. The purpose of this study was to use think aloud interviews to test adolescent understanding of three existing, and one newly created psychological well- or ill-being measures. An initial sample of 40 female participants aged 13–14 years took part in the study. Problematic items were identified based on thematic analysis of adolescent feedback. One item of the Lethargy Scale and one item of the Subjective Vitality Scale were adapted following integration of think aloud findings. The results also indicated that the Brief Serenity Scale was unsuitable for the participants. Subsequently, a second sample of 57 female participants aged 13–14 years completed the think aloud protocol with an alternative measure of serenity, the Child Serenity Scale, and thematic analysis revealed no items were problematic. A 24-item psychological well- and ill-being self-report measure was consequently proposed for use in future research.
Citation: Chatwin SCM, Pearson RM, Goswami H, Appleton PR (2026) Using think aloud with female adolescents to validate psychological well- and ill-being self-report measures. PLOS Ment Health 3(2): e0000551. https://doi.org/10.1371/journal.pmen.0000551
Editor: Lambert Zixin Li, National University of Singapore, SINGAPORE
Received: November 3, 2025; Accepted: January 15, 2026; Published: February 17, 2026
Copyright: © 2026 Chatwin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data is stored in Manchester Metropolitan University’s Research Repository, https://doi.org/10.23634/MMU.00642372.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Globally, 14% of adolescents report having a mental health disorder such as anxiety, depression, eating disorders and addictive behaviours [1]. National Health Service (NHS) data shows one in five children and young people in England aged eight to 25 had a probable mental health disorder in 2023, which is worse since the COVID-19 pandemic [2]. Moreover, the Children’s Society found 11% of 10- to 17-year-olds in the United Kingdom (UK) had low well-being, determined by low life satisfaction scores which were almost half than the European average [3]. Failing to address these issues in adolescence has consequences that stem into adulthood; 50% of adult mental health disorders onset by the age of 14, and ill-being is associated with an increased likelihood of impaired social relationships, substance misuse, and criminal behaviour [4].
Defining psychological well-being is complex because there are numerous conceptualisations, and no universally accepted definition. There is one unanimous theme however, and that is psychological well-being is defined as hedonic or eudemonic [5]. The hedonic approach defines psychological well-being in terms of life satisfaction and subjective happiness. The eudemonic approach focuses on optimal functioning [5]. Ryan and Deci [6] and Ryff [7] suggest due to its multidimensional nature, well-being should include both hedonic and eudemonic aspects. Ill-being has been relatively neglected compared to well-being in literature [8]. Yet again, there is no universally accepted definition of psychological ill-being. Many studies imply ill-being is the opposite of well-being or fail to provide a definition beyond “e.g., depression, stress” [9]. More recently, ill-being has been defined as “unpleasant feelings or emotions that impact the level of functioning” [10], and “negative psychological states” [11]. There is also a debate whether well- and ill-being are independent dimensions, or whether they sit at opposite ends of a bipolar continuum. The continuum theory suggests individuals with higher levels of ill-being would have lower levels of well-being and vice versa [12]. The independence theory suggests higher levels of ill-being would not necessarily be associated with lower levels of well-being and vice versa [12]. Bradburn’s [13] research into independence versus bipolarity discovered the association between positive and negative affect was not strong, despite being inversely correlated. Therefore, scoring high for one construct does not predict scoring for the other construct. The present study will take the view that psychological well- and ill-being are independent, and therefore need to be measured simultaneously but independently.
A criticism of well- and ill-being theories is that they were developed by adults, often for adults [14], but are regularly applied to young people without consideration of their relevance and applicability. Taking two hedonic theories, subjective well-being [15] and quality of life [16] as examples, well-being in the work domain, which is central to these theories, is not relatable for all stages of adolescence. It could be argued that these conceptualisations, and related measures based on these conceptualisations, may not accurately capture an adolescent’s experience of psychological well- and ill-being. To overcome the adult centrism often seen in previous research, some studies have involved adolescents when conceptualising well-being [17,18], but less attention has been paid to ill-being. In one study however, Bracey [19] asked a sample of young people in the UK to write down words they associated with well- and ill-being, and where they felt these feelings (head, body, and/or heart). Younger participants (aged 7–11 years) defined well- and ill-being as unidimensional, singularly occurring feelings. Older participants (aged 12–15 years) had a greater sense of their complexity, describing well-being as both hedonic and eudemonic (“you’d be happy, you’d be healthy…content with…life”), and ill-being as multidimensional (“scared”, “upset”, “you feel bad”) [19]. Bracey’s involvement of adolescents, and consideration of both hedonic and eudemonic perspectives, arguably produced more appropriate definitions for adolescents than existing adult definitions. Focusing on the psychological domain of well- and ill-being, Bracey’s [19] definitions are adopted in this study. Psychological well-being is having life satisfaction, happiness, and optimal functioning determined by positive affect, vitality, and serenity. Psychological ill-being is negative emotional states and undesirable psychological conditions determined by negative affect and lethargy. Participants in Bracey’s [19] work also identified hyperactivity as a facet of ill-being, however in the current study, it was decided to remove this from the definition due to its connotations with neurodivergent conditions such as attention deficit hyperactivity disorder (ADHD). Additionally looking at existing hyperactivity measures, they are often intended for clinical practice, or items reflect an outcome of ill-being rather than a facet of ill-being (e.g., “I was easily distracted, I found it difficult to concentrate” from the Strengths and Difficulties Questionnaire [20]).
A large body of research has examined young people’s psychological well- and ill-being utilising a variety of measures (e.g., Scale of Positive and Negative Experience (SPANE) [21]; Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) [5]; Children’s Society Index of Children’s Subjective Wellbeing (CSICSW)) [22]. Although previous measures are well-validated, they have limitations. They often only consider either eudemonic or hedonic well- (but not ill-) being; as noted above, several measures were developed by adults, for adults, but subsequently used with children/adolescents; and the content validity of these measures is rarely tested. The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) can facilitate the assessment of the content validity of self-report measures [23]. According to COSMIN guidelines, relevance, comprehensiveness, and comprehensibility of items from the perspective of the participants and the professional (i.e., researcher) need to be established for a self-report measure to have good content validity. With existing measures, COSMIN guidelines are not always met. For example, regarding the WEMWBS [5] the comprehensibility of items was poor with young participants, as items caused confusion, were misinterpreted, or participants were hesitant to answer [4]. This could be due to the measure stemming from adult conceptualisations of well-being, and/or the reading age not being suitable for younger people. Some measures have been developed from the perspective of adolescents albeit with a limited definition of what constitutes well- and ill-being (i.e., according to COSMIN guidelines, more relevant to the participants but not comprehensive in terms of content). Rees et al. [22] for example, chose to define well-being as unidimensional in terms of quality of life and all five items of the CSICSW include “life”, such as “my life is going well”, which does not capture the multidimensional nature of psychological well-being.
Several specific measures exist which capture the various facets of psychological well- and ill-being proposed by Bracey [19]. For example, the Positive and Negative Affect Scale-Child (PANAS-C) [24,25] measures adolescent positive and negative affect which are defined as “general emotional states which have a positive and negative valence and content, respectively” [19]. The Subjective Vitality Scale (SVS) [26] measures vitality which is defined by Bracey as “a positive state of activation marked by dynamic liveliness and energy” [19]. Although developed for adults, a 5-item version of SVS has been adopted in research with children [27]. Less research attention has been paid to serenity (“a positive state of deactivation marked by composed tranquillity and inner peace” [19]) as an indicator of psychological well-being. Studies that have targeted serenity in adults have employed the 65-item Serenity Scale (SS) [28] or the 22-item Brief Serenity Scale (BSS) [29], although no study has adopted either scale when measuring serenity in young people. With regards to lethargy, defined as “a negative state of deactivation marked by apathetic lethargy and ennui” [19], to date there are no measures that fit this definition. Some researchers have used single items such as “did you feel more tired yesterday or today than you usually do?” (Memorial Symptom Assessment Scale (MSAS 7–12) [30], and “how often do you feel tired when you go to school in the morning?” [31]. However, these items are a better measure of tiredness, rather than lethargy. Another issue is that studies often use lethargy interchangeably with fatigue;. whereas lethargy is a temporary “negative state of deactivation” [19], fatigue is extreme “tiredness resulting from physical or mental exertion or illness” [32]. This suggests measures of fatigue are not a suitable replacement for measures of lethargy. Therefore, a measure which focuses solely on lethargy as defined within Bracey’s [19] work and as experienced by young people is needed.
The present study
Previous research has utilised a variety of scales and questionnaires to capture indicators of young people’s psychological well- and ill-being. Although many scales have acceptable psychometrics, key limitations are discussed above. Moreover, to date, no scale exists that we are aware of that captures the multidimensional nature of psychological well- and ill-being as experienced by young people (in [19]). To begin to address these limitations, the present study used think aloud methods [33] with female adolescents to examine the content validity of four questionnaires that align with the definitions of well- and ill-being developed by Bracey [19]: Positive and Negative Affect Scale-Child (PANAS-C) [25], Subjective Vitality Scale (SVS) [26], and the Brief Serenity Scale (BSS) [29]. As mentioned above, as there are no suitable measures for Bracey’s [19] definition of lethargy, four items were developed in this study (see methods for more detail).
In this study, the focus was specifically on adolescent girls. Girls are more susceptible to lower levels of well-being than boys of the same age [34]. Deighton et al. [34] compared levels of well-being in a sample of 10,889 British adolescents each year for three years (11–14 years old). The results showed significant gender differences as emotional difficulties were higher for girls, behavioural difficulties were higher for boys (although girls’ scores were increasing each year) and subjective well-being was lower for girls [34]. It is clear female adolescents, especially aged around 14 years, are a particularly vulnerable group to these health risks and were therefore the target sample in this study.
Method
Ethics statement
Prior to the study commencing, ethical approval was obtained from the Faculty of Science and Engineering Ethics and Governance Committee at Manchester Metropolitan University, approval number 58960. As participants were children, written parent/guardian consent was obtained first, then the participants’ written assent was obtained. Participants’ identifying information were removed from the transcripts to assure confidentiality and anonymity. Participants were referred to according to identification codes, whereby each unique participant (P) was numbered (e.g., P1).
Transparency and openness
This study’s design and its analysis were not preregistered. All data, analysis code, and research materials are available upon request from the first author. Data were analysed using IBM SPSS Statistics version 29.0 [35].
Participants
Participants (N = 40) were female adolescents from the UK aged 13–14 years old (M = 13.3; SD = 0.46). Data collection started on 05/02/2024 and ended on 12/04/2024. Participants were recruited via emails to gatekeepers at schools (i.e., Headteachers) and sports clubs (i.e., coaches). Potential participants were provided with a: (1) parent/guardian information sheet, (2) parent/guardian consent form, (3) participant information sheet, and (4) participant assent form via onlinesurveys.ac.uk. The information sheets stated that participation was voluntary, interviews would be audio recorded on Microsoft Teams, any quotes included in the analysis would be anonymous, and that participants were free to withdraw from the study at any point. Participants could opt in to a prize draw for a £20 Amazon voucher. Participants were recruited from two schools and two sports clubs in the West Midlands in the UK. Using the Index of Multiple Deprivation tool [36], these locations are 89–92% more deprived than other neighbourhoods in England.
Procedure
In the think aloud method, participants are asked to verbalise their thoughts while completing a task, such as a questionnaire, to reveal their thought processes [33]. There are two variations of think aloud. In concurrent reporting, participants naturally express any thoughts they have about the task without attempting to explain them [33]. In immediate retrospective reporting, the researcher asks the participant questions immediately after the task to elicit additional thoughts [33]. Previous studies have shown that think aloud interviews are useful for identifying usability issues with questionnaires, such as stumbling, rereading and misinterpreting items [37,38]. This study employed immediate retrospective reporting, enabling the researcher to clarify any items that participants may not understand and to gather their suggestions on how to reword problematic items.
Think aloud interviews were conducted either online using Microsoft Teams (N = 1) or in a quiet room at a location convenient to the participant such as their school or sports club (N = 39). All participants completed the questionnaires using onlinesurveys.co.uk on a laptop. At the beginning of the interview, participants completed a short questionnaire to provide demographic information. Participants were then read standardised think aloud instructions adapted from Green and Gilhooly [39] and French, Cooke, McLean, Williams and Sutton [40]. The researcher answered any questions before starting the recording. Next, participants were given a practise task to familiarise themselves with the think aloud process. They completed five randomly selected items of the Me and My School Questionnaire (MMSQ) [41]. The MMSQ was chosen as it has a similar response format to the five questionnaires used in this study. It has also been validated for adolescent use [41]. The participants then completed the think aloud process for the remaining questionnaires.
Power relations, informed consent and confidentiality are the main ethical considerations when interviewing minors [42]. As informed consent was provided by a parent/guardian, assent by the participant, and confidentiality including anonymity assured, to minimise the power relation and influence, when the interview was conducted in person the researcher sat outside the line of sight of the participant. If the participant stopped talking for more than 10 seconds, the researcher told them to “please keep talking”. On average, interviews took 8 minutes 18 seconds (SD = 1 minute 38 seconds). Each transcript was transcribed verbatim.
Materials and measures
Self-report measures were selected to align with Bracey’s [19] definitions of psychological well- and ill-being. To ensure consistency throughout, participants were asked to answer the questionnaires based on their feelings in their general life over the past week.
Positive and Negative Affect.
The 10-item Positive and Negative Affect Scale-Child [25] has five positive affect items (e.g., “happy” and “cheerful”) and five negative affect items (e.g., “sad” and “mad”) which participants rated from 1 (never) to 5 (always).
Vitality.
A 5-item version of the Subjective Vitality Scale [26] was used to measure feelings of psychological energy and vitality. Items such as “I felt I had a lot of energy” and “I looked forward to each day” were rated from 1 (strongly disagree) to 5 (strongly agree).
Serenity.
All nine items from the inner haven subscale of the Brief Serenity Scale [28] were used to measure participants’ serenity. The items (e.g., “I am aware of inner peace” and “when I get upset, I become peaceful by getting in touch with my inner self”) were rated from 1 (never) to 5 (always).
Lethargy.
Where possible, the lethargy scale (LS) items were developed based on responses from participants in Bracey’s [19] study, addressing a key critique that previous measures were created without their involvement. For example, a 15-year-old boy said “sluggish…not want to do anything” [19] which became the items “I was tired a lot” and “I had no energy for things”. Four items were rated from 1 (never) to 5 (always).
Data analysis
Descriptive statistics (means and standard deviations) were calculated using IBM SPSS Statistics (Version 29) [35] to examine the similarity of responses to previous research findings to ensure the think aloud process did not impact how participants answered the questionnaire. The audio recordings were transcribed by the first author. The think aloud transcripts were then thematically analysed using Braun and Clarke’s [43] six-phase guide. The six steps are: (1) familiarizing yourself with your data, (2) generating initial codes, (3) searching for themes, (4) reviewing themes, (5) defining and naming themes, and (6) producing the report. Each item was assigned to one of two themes: no problem, or response problem. The response problem theme consisted of four sub-themes: stumbled or reread item, clarified understanding of item, misinterpreted item, and did not understand item. These themes were adapted from French et al. [40]. An item was reviewed if it fell into any of the response problem sub-themes across five or more participants. Finally, the reading age of the reviewed questionnaires were calculated using the Flesch-Kincaid Reading Grade Level (FKRGL) [44] in Microsoft Word. The FKRGL is determined by total number of words, total number of sentences, and total number of syllables [44]. A lower score denotes a lower reading age.
Results
Descriptive statistics
The ethnicity of participants is shown in Table 1. The most common ethnicity was White – British (37.5%) followed by Black – African and Mixed – White and Black Caribbean (12.5% each). Table 2 shows the descriptive statistics for each questionnaire. Any participants who answered “prefer not to answer” were excluded from the analysis of that item. All well-being scores were above the mid-point indicating moderately high levels, and all ill-being scores were below the midpoint indicating moderately low levels.
Thematic analysis
Table 3 shows the number of problems experienced by participants for each questionnaire. A ‘problem’ was an item coded under any of the response problem sub-themes in the thematic analysis. An item was deemed a ‘problematic item’ if this happened across five or more participants. On average, participants experienced 4.4 problems when completing the questionnaires. Only two participants experienced no problems. PANAS-C had a total of five problems across five items, LS had a total of 60 problems across four items, SVS had a total of 13 problems across five items, and the BSS had the most problems with a total of 122 across nine items.
Table 4 shows the number of participants experiencing problems per item. Problematic items are shown in bold. Nine out of 28 items had no problems. The item with the most problems was “I felt lethargic” from the LS with 58 problems experienced across 33 participants.
Items for review
Items deemed problematic following thematic analysis were reviewed. No items from the PANAS-C needed to be reviewed. One item from the LS, one item from the SVS, and eight items from the BSS were reviewed.
Lethargy Scale.
For the item “I felt lethargic”, 58 problems were experienced across 33 participants. Thirty-one participants explicitly stated that they did not understand what “lethargic” meant. Of these 31 participants, 14 struggled to, or incorrectly pronounced lethargic, and nine participants did not attempt to pronounce “lethargic” before stating they did not know what it meant. These participants opted to point to the item and ask for help. For example, [points to lethargic] “what does that mean?”. The researcher responded by saying “say it out loud, and if you still don’t understand it, tell me”. The participant attempted but still could not pronounce lethargic: “lef..ar… I don’t know” (P7). Participant 37 misunderstood the meaning of lethargic saying “yeah, like sick” after being asked to clarify their understanding of the item by the researcher. Finally, one participant clarified their understanding before answering the item: “what is that, tired?” (P24).
Subjective Vitality Scale.
Six participants had a problem with the item “I nearly always felt alert and awake”. All six participants stumbled or reread the item. For example, “I nearly fell…wait I nearly always felt alert and awake” (P38). Only one participant did not understand the item, “I nearly always felt awake…alert and awake. I don’t understand” (P12).
Brief Serenity Scale.
The most problematic item was “I feel serene”, as 36 participants experienced at least one problem with it. Of these 36 participants, 33 did not know what the word “serene” meant, and three participants had to clarify their understanding of serene. Responses included “I dunno what serene is” (P3) and “what’s serene? Calm?” (P21). Moreover, 18 participants stumbled or had to reread the item. The difficulty in pronouncing the word “serene” is demonstrated by participant 1 “I feel sincere. Is that sincere?” and participant 31 “I feel sincere. What does that mean?”.
Seven participants had problems with the item “I’m aware of an inner source of comfort, strength and security”. Five participants did not understand the item, such as participant 7 who said “I’m aware of an inner…inner source of comfort, strength and security. Uh I dunno”. Participant 4 also stumbled and reread the item as they said: “I’m aware of inner source…I am aware of an inner source of comfort, strength and security”. Lastly, one participant misunderstood the item, as they focused only on the “comfort” aspect of the item, neglecting “strength and security” from their thought process for their answer. They asked “what does that mean? … does that mean like I have my comforts and that sort of stuff?” (P18).
The item “during troubled times I experience an inner source of strength” was problematic for seven participants, with a total of nine problems. Three participants did not understand the item, for example “what? I don’t understand” (P12). Six participants stumbled or reread the question, such as “during trouble…troubled times I…I experience an inner source of strength” (P7) and “during troubled timed. Wait, during troubled times I experience an inner source of strength” (P9).
There was a total of six problems for the item “I experience peace of mind”. Two participants had to clarify their understanding, two participants stumbled or had to reread the item, and two participants did not understand the item. Example quotes include “umm. I experience peace of mind. What’s that? Like peace in your mind?” (P1), “I experience peace of mind…hmm… wait… not sure” (P5), and “I experience peace of mind. Does that mean like peace?” (P12).
Eight participants had a problem with the item “I am aware of inner peace”. Six of these participants did not understand the item, including confusion regarding what “inner peace” meant such as “what’s inner peace?” (P30). Two participants had to clarify their understanding of the item. For example, participant 22 said “what does it mean by aware, because like I know it, but I don’t have it… well I am aware that I have it because if someone says sorry then I’m like peace”. Participant 10 clarified their understanding by asking “does that mean like peaceful on the inside?”.
Another problematic item was “I experience an inner quiet that does not depend on events”. A total of 17 participants experienced a problem, with two participants rereading the item, one participant misinterpreting the item, and 14 participants not understanding the item. Problems occurred due to both the “inner quiet” aspect and the “does not depend on events” aspect of the item. For example, “does that mean your body is like quiet?” (P18) and “the end bit. Like the inner quiet, that’s what’s happening in your mind. But the end bit” (P31).
Three participants did not understand the item “when I get upset, I become peaceful by getting in touch with my inner self”, and five participants stumbled or had to reread the item. For example, participant 1 stumbled by saying “and when I’m…when I get upset, I become peaceful and get in touch with my inner self.” Participant 12 read the item correctly and then said “no, don’t understand”. Similarly, participant 28 had difficulty as they said, “what does that mean?”.
Finally, seven problems occurred across six participants with the item “I can feel angry and observe my feelings of anger and separate myself from it and still feel an inner peace”. Three problems were due to not understanding the item. For example, “These questions! I don’t know” (P12). Four problems were due to stumbling or rereading the item, as shown by participant 4 who said, “I feel angry and observe…I feel…I can feel angry and observe my feelings of anger and separate myself from it and still feel...” and answered “never” without reading the item in full. Participant 9 stumbled, reread the question, and still did not understand it. They said “wait I need to read that again… I can feel angry and observe my feelings of anger and sep-ar-at myself, separate myself from it and feel and still feel an inner peace.” This was followed by a 3 second pause, then they looked at researcher for help.
Changes to items
The items that required changes are shown in Table 5. Initially participants’ suggestions during the think aloud were used to reword problematic items so they could be more easily understood by young people. Some items however were deemed unsuitable and so were removed.
Lethargy Scale.
It was decided to remove the item “I felt lethargic”. Although participant 4 provided a suitable rewording, “what is that, tired?”, this was too similar to the item “I felt tired a lot”.
Subjective Vitality Scale.
Participants seemed to stumble on the “nearly always” part of the item “I nearly always felt alert and awake”. After asking the six participants that stumbled on “almost always” to reword the item, the most frequent (N = 4) suggestion was “mostly”. So, the item was reworded based on this participant feedback to “I mostly felt alert and awake”.
Brief Serenity Scale.
The suggestions from the participants on rewording the BSS items were discussed and considered by the study’s authors. However, after this process, the original and reworded items were deemed by the researchers as too complex for adolescents. The decision was therefore made to search for an alternative serenity scale whose items fit the definition of serenity used for this study, and were less challenging for young people to read and understand. This is described in the section below.
Revised measure of serenity
Materials
The search for an alternative serenity scale revealed the Child Serenity Scale (CSS) [45] which has 6-items measured on a 3-point Likert scale of “yes”, “more or less”, and “no”. The CSS was originally written in Spanish and developed in Argentina with Argentine children. Although items were available in English [45], it was decided to translate the original Spanish items into English, and then translate them back into English with the assistance of two bilingual researchers. The aim of the CSS is to determine a child’s ability to “regulate emotion and their responses to stressful events” [45]. Although this differs from Bracey’s [19] definition of serenity (“a positive state of deactivation marked by composed tranquillity and inner peace”), the items (e.g., “I am often a calm person”) reflect Bracey’s definition making it, according to COSMIN guidelines [23], a relevant and comprehensive measure of adolescents’ serenity.
Procedure
The same procedures and method described above were used for the CSS [44] with 57 new participants (M = 13.2, SD = 0.38) who completed a think aloud interview in person. Participants were females aged 13 or 14 years who live in the UK. Data collection started on 12/06/2024 and ended on 05/07/2024.
Data analysis
Descriptive statistics were calculated using SPSS [35]. Thematic analysis was conducted using the same protocol as the first think aloud. On average interviews took 4 minutes 23 seconds (SD = 42 seconds).
Results
Descriptive statistics
Table 6 shows the ethnicity of the participants. The most common ethnicity was Asian or Asian British – Indian (36.7%) followed by White – British (19.3%). Table 7 shows the descriptive statistics for the CSS. Any participants who answered “prefer not to answer” were excluded from the analysis of that item. Participants averaged a score of 2.26 out of 3, reflecting high levels of serenity.
Thematic analysis
Table 8 shows the total number of problems experienced by participants completing the CSS. On average, participants experienced 0.28 problems when completing the questionnaire. A total of 16 problems were experienced. Forty-four participants experienced no problems.
Table 9 shows the number of problems per item. “Most days there are times when I feel peaceful” and “I stay calm even if I can’t do what I like” had the most problems, both with four participants rereading the item. One participant reread the item “I am often a calm person” before answering. Two participants stumbled or reread “I solve my problems very calmly”. For example “I solve…I sol..I solve my problems very calmly” (P6). Two participants stumbled or reread “I often feel relaxed” as shown by participant 9, “I feel… no I often feel relaxed”. Four participants stumbled or misread “most days there are times when I feel peaceful”. For example, “most days are...most days there are times when I feel peaceful” (P10). Three participants reread the item “even if I have a problem, I can stay calm”. Four participants stumbled or misread “I stay calm even if I can’t do what I like”. For example “even if I…I stay calm even if I can’t do what I like” (P7). As no item had problems across five or more participants, all items were deemed suitable to use with adolescent females aged 13–14 years old, and no changes needed to be made.
Overall retained items
Table 10 shows the number of items retained, modified or removed from each sample. Out of the 28 items across questionnaires in Sample 1, 18 items were retained overall of which one item was modified and 9 items were removed. In Sample 2, all 6 items were retained without modifications. The resultant 24 items are shown in Table 11.
Reading age
The reading age for the resultant 24 items was 1.1 on the Flesch-Kincaid Reading Grade Level [44]; this is equivalent to five years old.
Response scale
The questionnaires used in this study have different response scales. This caused some confusion amongst participants; not all participants understood what “neutral” meant which is used by the SVS. There were no problems experienced by any participant with the ratings “never”, “rarely”, “sometimes”, “very often”, and “always” as used by the BSS and PANAS-C.
Discussion
Adolescents have previously defined psychological well-being as having life satisfaction, happiness, and optimal functioning characterised by positive affect, vitality, and serenity and psychological ill-being as negative emotional states and undesirable psychological conditions determined by negative affect, and lethargy [19]. These definitions reveal the multidimensional nature of psychological well- and ill-being as experienced by young people. Building upon Bracey’s [19] initial work, the purpose of this study was to examine the content validity of questionnaire measures of the aforementioned indicators of psychological well- and ill-being using think aloud methods with a sample of 13–14-year-old girls. When testing the content validity of a self-report measure, COSMIN guidelines [23] state items need to be relevant, comprehensive, and comprehendible by the researchers and participants. Selecting items based on Bracey’s [19] indicators of psychological well- and ill-being ensured that the content was relevant and comprehensive. The think aloud interviews would reveal whether the items were comprehendible by the participants.
Descriptive statistics were initially calculated to assess the similarity of the participants’ psychological well- and ill-being scores to those reported in previous research. In this study, on average, participants scored 3.7 out of five for positive affect and 1.93 out of five for negative affect which is similar to mean scores (M = 3.38 and M = 1.80) reported in the original work of Ebesutani et al. [25]. Likewise, participants’ mean score for vitality in this study was 3.66 out of five. This aligns with previous use of SVS [26] where adolescent football players aged 10–14 years old in the UK had a mean score of 3.98 [27]. In this study, participants scored an average of 2.26 out of three for serenity using the CSS, which is similar to participants in Argentina aged 9–13 years old who had a mean score of 2.18 [46]. Finally, participants reported a mean score of 2.50 out of five for lethargy in this study, but due to the lack of adolescent lethargy measures, no comparison can be made. The similarity of the scores in this study compared to previous research suggests the think aloud interview did not impact the participants’ scoring.
Thematic analysis was used to code transcripts and create themes based on the French et al. [40] protocol. There were two main themes: no problem and response problem. The response problem theme was split into four sub-themes: stumbled or reread item, clarified understanding of item, misinterpreted item, and did not understand item. An item was deemed problematic and reviewed if it fell into the response problem theme across five or more participants. No items from the PANAS-C [25] needed to be reviewed. One item of the SVS [26], one item from the LS, and eight items from the BSS inner haven subscale [29] were reviewed.
For the SVS [26] the item “I nearly always felt alert and awake” was changed to “I mostly felt alert and awake” based on feedback from the participants. For the LS, the item “I felt lethargic” was removed because participants’ suggestions to reword the item to “I felt tired”, “I didn’t want to do anything” or “I had low energy” were too similar to other items on the lethargy scale.
The analysis revealed significantly more issues with the BSS items. Even after considering suggestions from the participants, and discussions from the four researchers how to reword the items, eight items were deemed too complex for adolescents. These problems may have occurred due to the linguistic features of the items, such as multi-clause sentences (e.g., “I can feel angry and observe my feeling of anger and separate myself from it and still feel an inner peace”), and complex abstract language (e.g., “inner self”, “inner calm”, and “inner peace”). When developing a questionnaire for adolescents, items should be unambiguous and simple, avoiding double-barrelled questions [47]. This is in line with COSMIN guidelines where comprehensibility is a key element of item development [23].
Subsequently, the Child Serenity Scale (CSS) [45] was identified and back translated from Spanish to English, then tested using the same think aloud protocol with a second sample of adolescent girls. Data analysis revealed no problems with the six items of the CSS. Items were most likely understood by the participants due to the simplicity of items (e.g., “I was a calm person”).
The removal of problematic items across the questionnaires resulted in 24 items, 16 measuring well-being and eight measuring ill-being (Table 11), which have a suitable reading age (1.1 on the FKRGL [44]) for the target population of 13–14-year-olds. Even though the BSS had a suitable reading age for 13–14 year olds (5.2 equivalent to 11 years old), participants still identified problems with the items. Therefore, it was important that the final items had a suitable reading age, and simple linguistic features such as no complex abstract (e.g., “inner”) feelings that young people may be less familiar with.
The results of the thematic analysis confirm the importance of including the target population during the development and validation of a questionnaire. The PANAS-C [25] and the CSS [45] were the only scales where children or adolescents were involved in the original development of the measure, and in this study, were the only scales where no changes were needed. The results also highlight that existing adult measures of psychological well- and ill-being may not produce valid or reliable results when used with adolescents, due to younger people not understanding all items of the measure. Future studies should seek to use measures that were developed with younger people’s input to reflect their current understanding and experience of psychological well- and ill-being, and with their reading age in mind.
Limitations
A limitation of this study was having to create the items to measure lethargy, rather than using an existing questionnaire. However as mentioned, no measure that fits the definition of lethargy used in this study exists, and the items were worded based on adolescent feedback from Bracey’s [19] work. The results also confirmed the sample understood the retained three items. Another limitation is that the study only focuses on one type of validity (construct). Future research should test other forms of validity (e.g., using factor analysis), invariance, and the reliability of the scales used in this study and/or the identified 24 items from the think aloud interviews. This would further strengthen the confidence in using the scales or items with adolescents. A further limitation is that neither the academic performance nor the reading skill of participants were measured. However, as these participants were recruited from highly deprived areas (which is an indicator of reading skill [48], it may be expected that other female adolescents from less deprived areas will understand the items. This is something that future research should determine. Moreover, only female participants aged 13–14 years from the UK were recruited. This study focused on teenage girls s because this population is particularly vulnerable to lower well-being and higher ill-being levels [34]. However, it means the results cannot be generalised to other populations. Additionally, a small cluster of schools and sports clubs were used to recruit participants which also limits the generalisability of the results. Future research should aim to use think aloud with additional samples (e.g., adolescent boys; older adolescent girls; samples from other countries) to provide further evidence regarding the construct validity of the targeted questionnaires. Purposive sampling could be used to ensure participants from a variety of socioeconomic backgrounds and reading skill are also recruited.
Conclusions
This think aloud study revealed female adolescents aged 13–14 years old did not understand some of the items on the targeted measures of psychological well- and ill-being, despite many of the measures being extensively used in research. By not understanding the items, adolescents may not be answering the questionnaires with content or construct reliability, therefore influencing their scores. This study proposes changes, suggested by female adolescents, that could be made to the wording of the items which are more suitable for this population, thus providing more accurate scores and reflections of their psychological well- and ill-being. If future studies further test and establish the psychometrics of the resultant 24-item questionnaire, it could be an important tool to measure adolescent psychological well- and ill-being both in research and in applied settings such as schools.
Public significance statement
This study suggests that female adolescents aged 13–14 years do not understand all items on the psychological well- and ill-being self-report measures that were tested. It could be argued that previous research conducted among adolescents in this age group using these measures might not be reliable or valid. Rewording problematic items based on participant feedback resulted in a 24-item self-report measure that future research should seek to validate.
Supporting information
S1 File. Questionnaire pack provided to participants for part 1 and part 2 of the study.
These are the original items before amendments were suggested.
https://doi.org/10.1371/journal.pmen.0000551.s001
(DOCX)
References
- 1. World Health Organization. Mental health of adolescents. 2025 [cited 11 Sep 2025]. Available from: https://www.who.int/news-room/fact-sheets/detail/adolescent-mental-health
- 2.
Newlove-Delgado T, Marcheselli F, Williams T, Mandalia D, Dennes M, McManus S. Mental Health of Children and Young People in England, 2023. Leeds: NHS England; 2023.
- 3.
Chollet D, Turner A, Marquez J, O’Neill J, Moore L. The Good Childhood Report 2024. London: The Children’s Society; 2024.
- 4. Clarke A, Friede T, Putz R, Ashdown J, Martin S, Blake A, et al. Warwick-Edinburgh Mental Well-being Scale (WEMWBS): validated for teenage school students in England and Scotland. A mixed methods assessment. BMC Public Health. 2011;11:487. pmid:21693055
- 5. Tennant R, Hiller L, Fishwick R, Platt S, Joseph S, Weich S, et al. The Warwick-Edinburgh Mental Well-being Scale (WEMWBS): development and UK validation. Health Qual Life Outcomes. 2007;5:63. pmid:18042300
- 6. Ryan RM, Deci EL. On happiness and human potentials: a review of research on hedonic and eudaimonic well-being. Annu Rev Psychol. 2001;52:141–66. pmid:11148302
- 7. Ryff CD. Happiness is everything, or is it? Explorations on the meaning of psychological well-being. J Pers Soc Psychol. 1989;57(6):1069–81.
- 8.
Kagan S. An introduction to ill-being. Oxford Studies in Normative Ethics, Volume 4. Oxford University Press; 2014. pp. 261–88. https://doi.org/10.1093/acprof:oso/9780198722144.003.0012
- 9. Hale GE, Colquhoun L, Lancastle D, Lewis N, Tyson PJ. Review: Physical activity interventions for the mental health and well-being of adolescents - a systematic review. Child Adolesc Ment Health. 2021;26(4):357–68. pmid:34105239
- 10. Bai Z, Li Y, Yang Y, Xie C, Zhu Z, Xu Y, et al. The effectiveness of plaza dancing on psychological well-being and ill-being: a systematic review and meta-analysis. Front Psychol. 2022;13:864327. pmid:35496162
- 11. Tape N, Branson V, Dry M, Turnbull D. The impact of psychological well-being and ill-being on academic performance: a longitudinal and cross-sectional study. Educ Dev Psychol. 2021;38(2):206–14.
- 12. Ryff CD, Dienberg Love G, Urry HL, Muller D, Rosenkranz MA, Friedman EM, et al. Psychological well-being and ill-being: do they have distinct or mirrored biological correlates? Psychother Psychosom. 2006;75(2):85–95. pmid:16508343
- 13. Bradburn N. The structure of psychological well-being. Aldine. 1969;6(1):53–70.
- 14. Sixsmith J, Nic Gabhainn S, Fleming C, O’Higgins S. Childrens’, parents’ and teachers’ perceptions of child wellbeing. Health Educ. 2007;107(6):511–23.
- 15. Diener E. Subjective well-being. The science of happiness and a proposal for a national index. Am Psychol. 2000;55(1):34–43. pmid:11392863
- 16.
Sirgy M. The psychology of quality of life. Social Indicators Research Series. Dordrecht: Springer Netherlands; 2012.
- 17. Gennings EK, Brown DHJ, Hewlett D. Constructing a definition: Adolescent wellbeing from the perspective of the child and expert. Int J Wellbeing. 2021;11(1).
- 18. Navarro D, Montserrat C, Malo S, González M, Casas F, Crous G. Subjective well-being: what do adolescents say? Child Fam Soc Work. 2015;22(1):175–84.
- 19.
Bracey S. The conceptualisation and assessment of child and adolescent well- and ill-being. University of Birmingham; 2019. https://etheses.bham.ac.uk/id/eprint/9351/
- 20. Goodman R. The strengths and difficulties questionnaire: a research note. J Child Psychol Psychiatry. 1997;38(5):581–6. pmid:9255702
- 21. Diener E, Wirtz D, Tov W, Kim-Prieto C, Choi D, Oishi S, et al. New well-being measures: short scales to assess flourishing and positive and negative feelings. Soc Indic Res. 2009;97(2):143–56.
- 22. Rees G, Goswami H, Bradshaw J. Developing an index of children’s subjective well-being in England [Internet]. The Children’s Society. 2010. Available from: https://www.york.ac.uk/inst/spru/pubs/pdf/childswb.pdf
- 23. Gagnier JJ, Lai J, Mokkink LB, Terwee CB. COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Qual Life Res. 2021;30(8):2197–218. pmid:33818733
- 24. Laurent J, Catanzaro SJ, Joiner TE, Rudolph KD, Potter KI, Lambert S, et al. A measure of positive and negative affect for children: Scale development and preliminary validation. Psychol Assess. 1999;11(3):326–38.
- 25. Ebesutani C, Regan J, Smith A, Reise S, Higa-McMillan C, Chorpita BF. The 10-item positive and negative affect schedule for children, child and parent shortened versions: application of item response theory for more efficient assessment. J Psychopathol Behav Assess. 2012;34(2):191–203.
- 26. Ryan RM, Frederick C. On energy, personality, and health: subjective vitality as a dynamic reflection of well-being. J Pers. 1997;65(3):529–65. pmid:9327588
- 27. Papaioannou AG, Appleton PR, Torregrosa M, Jowett GE, Bosselut G, Gonzalez L, et al. Moderate-to-vigorous physical activity and personal well-being in European youth soccer players: Invariance of physical activity, global self-esteem and vitality across five countries. Int J Sport Exerc Psychol. 2013;11(4):351–64.
- 28. Roberts K, Cunningham G. Serenity: concept analysis and measurement. Educ Gerontol. 1990;16(6):577–89.
- 29. Kreitzer MJ, Gross CR, Waleekhachonloet O, Reilly-Spong M, Byrd M. The brief serenity scale: a psychometric analysis of a measure of spirituality and well-being. J Holist Nurs. 2009;27(1):7–16. pmid:19176898
- 30. Collins JJ, Devine TD, Dick GS, Johnson EA, Kilham HA, Pinkerton CR, et al. The measurement of symptoms in young children with cancer: the validation of the Memorial Symptom Assessment Scale in children aged 7-12. J Pain Symptom Manage. 2002;23(1):10–6. pmid:11779663
- 31. Petersen S, Bergström E, Brulin C. High prevalence of tiredness and pain in young schoolchildren. Scand J Public Health. 2003;31(5):367–74. pmid:14555373
- 32.
Thomas M. Persistent fatigue in chronic conditions. Springer eBooks. 2018. pp. 55–66.
- 33. Ericsson KA, Simon HA. Verbal reports as data. Psychol Rev. 1980;87(3):215–51.
- 34. Deighton J, Yoon Y, Garland L. Learning from HeadStart: the mental health and wellbeing of adolescent boys and girls. London: EBPU. 2020 [cited 2023 Oct 8]. Available from: https://www.tnlcommunityfund.org.uk/media/insights/documents/evidencebriefing7_february_2020_eversion_final.pdf?mtime=20200605170535&focal=none
- 35.
IBM Corp. IBM SPSS Statistics for Windows [computer software]. Version 29.0. 2025.
- 36. GOV.UK. Deprivation in England [Internet]. Communities.gov.uk. 2025 [cited 2025 Dec 20]. Available from: https://deprivation.communities.gov.uk/
- 37. Darker CD, French DP. What sense do people make of a theory of planned behaviour questionnaire? J Health Psychol. 2009;14(7):861–71.
- 38. van Oort L, Schröder C, French DP. What do people think about when they answer the Brief Illness Perception Questionnaire? A “think-aloud” study. Br J Health Psychol. 2011;16(2):231–45.
- 39.
Green C, Gilhooy K. Protocol analysis: Practical implementation. Handbook of qualitative research methods for psychology and the social sciences. 1996. pp. 55–74.
- 40. French DP, Cooke R, McLean N, Williams M, Sutton S. What do people think about when they answer theory of planned behaviour questionnaires? A “think aloud” study. J Health Psychol. 2007;12(4):672–87. pmid:17584818
- 41. Deighton J, Tymms P, Vostanis P, Belsky J, Fonagy P, Brown A, et al. The development of a school-based measure of child mental health. J Psychoeduc Assess. 2013;31(3):247–57. pmid:25076806
- 42. Kirk S. Methodological and ethical issues in conducting qualitative research with children and young people: a literature review. Int J Nurs Stud. 2007;44(7):1250–60. pmid:17027985
- 43. Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006;3(2):77–101.
- 44. Thomas G, Hartley RD, Kincaid JP. Test-retest and inter-analyst reliability of the automated readability index, flesch reading ease score, and the fog count. J Reading Behav. 1975;7(2):149–54.
- 45. Oros LB, Cuello MI, Richaud MC. Evaluación de la serenidad infantil. Interdisciplinaria Revista de Psicología y Ciencias Afines. 2021;38(2):41–57.
- 46. Oros LB, Chemisquy SN, Vargas-Rubilar JA. Interpersonal trust: its relevance for developing positive emotions and social skills during childhood. Interpersona. 2024;18(1):124–42.
- 47. Bell A. Designing and testing questionnaires for children. J Res Nurs. 2007;12(5):461–9.
- 48. Romeo RR, Uchida L, Christodoulou JA. Socioeconomic status and reading outcomes: neurobiological and behavioral correlates. New Dir Child Adolesc Dev. 2022;2022(183–184):57–70. pmid:35868867