Factorial validity of the Twi versions of five measures of mental health and well-being in Ghana

Background Mental health is considered an integral part of human health. Reliable and valid measurement instruments are needed to assess various facets of mental health in the native language of the people involved. This paper reports on five studies examining evidence for the factorial validity of the Twi versions of five mental health and well-being measurement instruments: Affectometer-2 (AFM-2); Automatic Thought Questionnaire–Positive (ATQ-P); Generalized Self-Efficacy Scale (GSEs); Patient Health Questionnaire-9 (PHQ-9); and Satisfaction with Life Scale (SWLS) in a rural Ghanaian adult sample. Method Measures were translated and evaluated using a research-committee approach, pilot-tested, and administered to adults (N = 444) randomly selected from four rural poor communities in Ghana. We applied confirmatory factor analysis (CFA), bifactor CFA, exploratory structural equation modeling (ESEM), and bifactor ESEM to the AFM-2, ATQ-P, and the PHQ-9, and CFA to the GSEs and the SWLS. The omega coefficient of composite reliability was computed for each measure. Results A two-factor bifactor ESEM model displayed superior model fit for the AFM-2. The total scale and the Negative Affect subscale, but not the Positive Affect subscale, attained sufficient reliability. Two models (a four-factor 22-item bifactor ESEM model and a 5-factor 22-item ESEM model) fitted the data best for the ATQ-P. The bifactor ESEM model displayed a high reliability value for the total scale and satisfactory reliability values for three of its four subscales. For the GSEs, a one-factor CFA model (residuals of items 4 and 5 correlated) demonstrated superior model fit with a high reliability score for the total scale. A two-factor ESEM model outperformed all other models fitted for the PHQ-9, with moderate and satisfactory reliability scores for the subscales. A one-factor CFA model (residuals of item 4 and 5 correlated) demonstrated superior model fit for the SWLS, with a satisfactory reliability value for the total scale. Conclusions Findings established evidence for the factorial validity of the Twi versions of all five measures, with the global scores, but not all subscale scores, demonstrating satisfactory reliability. These validated measurement instruments can be used to assess mental health and well-being in the research and practice contexts of the current sample.


Introduction
There is a growing global interest in strengths-based research that focuses on the promotion of positive human functioning [1,2]. Recent scholarly efforts have expanded from the traditional foci of assessing and treating psychopathology to also evaluating and promoting people's strengths and mental health [3,4]. This has led to a rapid increase in the development of several measurement instruments that assess the strengths, capacities, and mental health of people in different contexts. A comprehensive assessment of positive mental health involves a consideration of both hedonic and eudaimonic aspects of well-being. Generally, hedonia emphasizes pleasure, subjective happiness, and avoidance of negative affect [3,5]. Eudaimonia, on the other hand, focuses on aspects such as purpose in life, self-actualization, and personal growth [6,7]. There is also a growing recognition of the complex interactions between positive aspects of human functioning and negative experiences that influence thinking patterns, feelings, and behavior of people [8,9]. Consequently, a complete assessment of mental health is complemented by the evaluation of negative experiences and emotional states, such as depression and negative affect [9].
Recent developments in the field of positive psychology have led to renewed research interest in exploring evidence for the validity of measurement instruments aimed at measuring constructs such as mental well-being, satisfaction with life, and self-efficacy across populations and contexts [4,10]. The majority of these measurement instruments in use in most African settings were developed and validated from a Western perspective, assumed individualistic cultural orientation and values, and were largely validated with unrepresentative Western samples [11]. Owing to vast cultural differences that exist between people from different social structures and value orientations, for instance, between Western and African societies [11,12], it is important that the evidence for the validity of measurement instruments that were developed and normed in one context (e.g., Western) are established before they are administered in other (e.g., African) contexts. Evidence suggests that the conceptualization, interpretation, and expression of well-being differ between individualistic (e.g., Euro-American) and collectivistic (e.g., African and Asian) cultures [13].
Although several mental health and well-being measurement instruments are used for research and clinical practice in Ghana, very few, such as the General Health Questionnaire-12 [14], Perceived Social Support from Family and Friends Scale [15], and the Multidimensional Scale of Perceived Social Support [16] have been validated in their original English versions in the Ghanaian context. Furthermore, given that the meaning or manifestation of constructs may differ across cultural groups, it is important that measurement instruments are translated and the evidence for their validity established in the languages and cultural contexts of the target population. Empirically validated versions of mental health and well-being measurement instruments in the native languages can strengthen the drive towards ensuring valid, reliable, and culturally-appropriate assessment of various constructs of mental health. This would also provide opportunities for researchers to empirically evaluate mental health intervention programs in the Ghanaian context. Furthermore, in-depth information on the nature and prevalence of mental health and well-being for a particular group can serve as a useful resource for designing context-appropriate and cost-effective interventions for the people involved.

The present study
Although the Affectometer-2 (AFM-2), Automatic Thought Questionnaire-Positive (ATQ-P), Generalized Self-Efficacy scale (GSEs), Patient Health Questionnaire-9 (PHQ-9), and Satisfaction with Life Scale (SWLS) are often used to assess mental health and well-being in research and practice in Ghana, their psychometric properties have not been investigated in any Ghanaian population, at present. This paper reports on five studies examining the evidence for the factorial validity of the Twi versions of five mental health and well-being measures in a sample of adults drawn from four rural poor communities in the middle belt of Ghana. We first present a general description of the methodology implemented in all five studies. This is followed by a sequential presentation of each validated measurement instrument, together with their results and discussion of findings.

Design and participants
A once-off cross-sectional survey design was implemented to collect data for the five studies. Participants (N = 444) were Twi-speaking adults drawn from four rural poor communities from the Sunyani West District (SWD) of the Brong Ahafo region of Ghana. Measures were interviewer-administered in the native language (Twi) of participants.

Study setting
The SWD is one of 27 districts in the Brong Ahafo region with a population of 85,272 and a total number of 10,715 households [17]. Four communities were randomly selected from a list of 18 communities categorized as extremely poor (with an average of 150 households and an income per head below 50% of the poverty line of US$1.90 a day) or classified as ultra-poor (with US$ 1.25 or less a day) [18][19][20]. Most of these communities have poor road networks or footpaths connecting them and have only basic (primary) schools. Although most communities are connected to the national grid, only about half of the households are connected to electricity. Residents are mainly peasant farmers and traders of farm produce who also share similar socioeconomic characteristics [17,18].

Measures
Five mental health and well-being measures commonly used in Ghana, including the Affectometer-2 (AFM-2), Automatic Thoughts Questionnaire-Positive (ATQ-P), Generalized Self-Efficacy Scale (GSEs), Patient Health Questionnaire-9 (PHQ-9), and Satisfaction with Life Scale (SWLS) were selected to examine the evidence for their factorial validity in this study. Presently, there is a general lack of research that investigates the evidence for the validity of the Twi version of mental health and well-being measurement instruments in Ghana, particularly in a rural poor context. The selected measurement instruments have shown promise in previous African studies [21-23, 47, 49], are all relatively short, and together assess facets relevant to the evaluation of mental health and well-being in the African context. The selection of these measures was also supported by the growing research evidence that suggests that the complete evaluation of mental health and well-being should also be complemented by the assessment of negative experiences and emotional states [9], considering the complex interactions between positive aspects of human functioning and negative experiences [8,9] that, together, influence thinking patterns, feelings, and behavior of people.
Sociodemographic data. We collected sociodemographic information including gender, age, education, marital status, and income.

Procedure
Preparation of scales. We followed the research-committee approach [24,25] to translate the measures from English to Twi. First, the measures were translated into Twi by a translator and back-translated into English by another. The English versions (i.e., the original English version and the Twi-English back-translated version) were then compared and evaluated by a panel of academics including a linguistic (forward translator), an English professor also fluent in Twi (back-translator), the first author (who is a clinical psychologist and a native Twi speaker), and an independent member (who is fluent in Twi and English). To ensure that the translated versions were comprehensible and culturally-sensitive, we conducted a pilot study with a small group of 38 participants drawn from the target population. Participants rated the instructions, items, and the response format as "clear" or "unclear", and indicated their understanding of the instructions, items, and response formats [26]. Participants who rated instructions, items, or the response format as "unclear" were asked to suggest a revision. Overall, the evaluation of the Twi-English back-translated versions showed two differences in meaning in the wording of the ATQ-P and one difference in meaning in the wording of the SWLS. In all instances, the research committee evaluated the differences, reached consensus, and applied the necessary revisions. See S1 File for the final Twi versions of the scales.
Fieldwork and data gathering. Measures were interviewer-administered by four trained psychology graduates fluent in Twi. Prior to the interviews, copies of ethical approval documents and permission letters were presented to the chiefs and community leaders of the communities involved for permission to conduct the study. With assistance from community leaders, the research team recruited an individual who acted as an independent mediator to introduce the researchers and the study to households a week before data gathering. Written informed consent was obtained from individuals who met the inclusion criteria and agreed to participate. Interviews were conducted in the homes of participants at a time that were convenient to them and at a place that ensured privacy. Data were collected electronically with the SurveyCTO software within a period of one month.

Data analysis
IBM SPSS was used to clean the data and to obtain descriptive statistics. The data was analysed with Mplus version 8.3 [27]. We conducted a literature search for previous studies that examine the evidence for the factor validity of the selected measurement instruments and identified alternative models that were fitted for some of the measurement instruments. The theorized models, together with the alternative models, were tested for the respective measurement instruments. We applied CFA, bifactor CFA, exploratory structural equation modeling (ESEM), and bifactor ESEM to the AFM-2 (Study 1), ATQ-P (Study 2), and PHQ-9 (Study 4). Only CFA was applied to the GSEs (Study 3) and the SWLS (Study 5) since these measurement instruments consist of a single factor only and no cross-loadings and, or a general factor can be specified.
CFA is based on the independent clusters model, where it is assumed that each indicator only loads on a single latent (target) factor [28], where the cross-loadings on nontarget factors are also assumed to be zero. CFA is applied to evaluate how well the measured variables represent the number of constructs and to confirm or reject the measurement theory [28,29]. CFA, however, fails to account for two important sources of construct-relevant psychometric multidimensionality, namely, the hierarchical nature of psychological constructs (that explains the associations of items to their specified target factors, as well as a higher-order factor, such as overall positive mental health), and the imprecise nature of items where items typically load on their target factors and have cross-loadings on nontarget but conceptually-related factors [29,30,31].
The first and second sources of construct-relevant multidimensionality are addressed by bifactor modeling [32] and ESEM [33], respectively. Bifactor models (e.g., the bifactor CFA model) permit the simultaneous specification of a general factor as well as separate orthogonal group factors which result from variance not explained by the general factor for different content domains [34]. A bifactor CFA model allows for the separation of the variance explained by the specific and general factors and tests if there is a general underlying factor that directly influences the items in addition to the influence of the specific factors [29,35]. ESEM, on the other hand, integrates the characteristics of exploratory factor analysis (EFA) and CFA by allowing items to load on their intended factor(s), as well as on nonintended factors [28,33]. ESEM models have been shown to produce better fit and more accurate factor correlations compared to CFA [36]. A bifactor ESEM model combines bifactor modeling and ESEM by concurrently allowing items to load on all factors (with nontarget loadings as close to zero as possible) and to load on both general and specific factors [29].
The robust maximum likelihood (MLR) estimator was applied to accommodate any deviations from normality for all five measurement instruments. The weighted least square mean and variance adjusted (WLSMV) estimator was additionally applied to the two measures (GSEs [Study 3] and PHQ-9 [Study 4]) that have only four response options [37]. We parameterized the CFA and bifactor CFA models by fixing the variances of the latent factors to one [38]. Oblique target rotation [33] and orthogonal bifactor target rotation [29] were applied for ESEM and bifactor ESEM, respectively. For the target rotation, all loadings on the nontarget factors were set to approximately, but not constrained to, zero [33]. There were no missing data.
We considered the guidelines postulated by Byrne [39] to determine how well each model fit the data. Comparative fit index (CFI) and Tucker-Lewis Index (TLI) values of .90 and larger indicate reasonable fit and values of .95 and larger indicate good fit. For both the standardized root mean square residual (SRMR, only applicable where the MLR estimator was applied) and the root mean square error of approximation (RMSEA), values below .08 suggest reasonable fit and values below .05 indicate good fit. A cut-off value of less than 1.0 suggest good fit for the weighted root mean square residual (WRMR), which is applicable only where the WLSMV estimator was applied [40]. Given that the chi-square test is highly sensitive to sample size, the CFI, TLI, RMSEA, and the SRMR or WRMR were considered in decision making. We examined the loadings of all items that made up a particular factor in determining if that factor is well-defined. Factor loadings higher than ±.40, preferably, and ±.30 at minimum were deemed indicative of a well-defined factor [41]. We inspected the size and significance of the loadings of the items on their respective factors and reported statistically significant cross-loadings on nontarget factors for measurement instruments where ESEM and bifactor ESEM analyses were conducted. We also interpreted small R 2 -values (percentage of the item variance explained by the model) and small factor loadings on their target factors as possible indications of local misfit. We estimated the model-based omega coefficients of composite reliability [42] using the formula stipulated by Sánchez-Oliva et al. [43], where ω = (S|λ i |) 2 / ([S|λ i |] 2 + Sδ ii ), with λ i representing the factor loadings and δ ii the error variances. Our interpretation of reliability values for first order (non-bifactor) models is based on the standard convention that satisfactory reliability estimates should be .70 and above [44]. Perreira et al. [45] suggest that omega reliability coefficients of .50 and above are satisfactory for bifactor models.

Ethical considerations
The Health Research Ethics Committee (HREC) of the North-West University (NWU-00109-17-S1), South Africa, and the Noguchi Memorial Institute for Medical Research Institutional Review Board (NMIMR-IRB) of the University of Ghana (NMIMR.IRB CPN OO7/17-18), Ghana, approved this study. Permissions were also granted by the Regional and District Health Directorates of the SWD and the chiefs and elders of the communities involved. Participants were assured of the confidentiality of data and about their right to withdraw from the study up to the point of data analysis, without any consequences. Written informed consent was obtained from all participants.

Demographic profile of participants
The demographic profile of the participants is shown in Table 1. The majority of participants were male, married, and had no formal education. All participants were fluent in Twi.

Superior model, factor loadings, and their reliabilities
The fit indices for all models are presented in Table 2. The standardized factor loadings for the superior model fit for all measurement instruments are displayed in Table 3.

Study 1: Affectometer-2
Description of measure: Affectometer-2 (AFM-2). The AFM-2 measures general happiness or a general sense of well-being by assessing the balance of positive and negative feelings in individuals' recent experiences [46]. The short version, which is used in the present study, comprises two 10-item subscales: Positive Affect (AFM-PA) and Negative Affect (AFM-NA). The AFM-2 has five ordinal response levels (1 = not at all; 2 = occasionally; 3 = some of the time, 4 = often; 5 = all of the time). The total score, the positive-negative affect balance (PNB), is indicated by AFM-PA minus AFM-NA, where a higher score of AFM-PA over AFM-NA denotes positive mental health. The developers of the scale reported Cronbach's alpha scores of .88 and .93 for the AFM-PA and AFM-NA, respectively [46]. Cronbach's alpha scores of .64 and .79 for the AFM-PA and AFM-NA, respectively, were found in a Setswana student group in South Africa [47].
Similar to the original hypothesis [46], a one-factor solution was reported for a Scottish adult sample when principal component analysis (PCA) was conducted [48]. The authors, however, acknowledged the independence of positive and negative affect and cautioned that in some circumstances it might be useful to score the positive and negative subscales separately. On the contrary, a two-factor solution (AFM-PA and AFM-NA) was reported after CFA and EFA were applied to a dataset collated from a sample of Setswana-speaking South Africans [49].  Results and discussion. First, we tested the originally hypothesized [46] one-factor CFA model (AFM-Model 1). This model displayed a poor fit to the data (CFI = .714; RMSEA = 0.094). Next, we tested a two-factor CFA model (AFM-Model 2) with two correlated factors, namely, Positive Affect (AFM-PA; items 1, 3, 4, 7, 9, 11, 13, 14, 17, and 19) and Negative Affect (AFM-NA; items 2, 5, 6, 8, 10, 12, 15, 16, 18, and 20) as suggested by Wissing et al. [49]. The psi matrix was not positively definite for this model. We proceeded to test a two-factor bifactor CFA model (AFM-Model 3) with the two specified factors (AFM-PA and AFM-NA) and a general factor onto which all items loaded, with the specific factors being orthogonal. The theta matrix was not positively definite for this model. Consequently, we tested a two-factor ESEM model (AFM-Model 4). This model demonstrated reasonable fit (CFI = .926; RMSEA = 0.051). Subsequently, a two-factor bifactor ESEM model (AFM-Model  PLOS ONE 5) was tested. This model displayed superior fit (CFI = .952; RMSEA = 0.044) compared to the previous models. For AFM-Model 5, ten of 20 items presented with strong and statistically significant loadings on the general factor. However, only one item (item 15: "I wish I could change some part of my life") intended for the AFM-NA specific factor displayed strong and statistically significant loading. Items 8, 12, 18, and 20 presented weak but statistically significantly cross-loadings on the nontarget AFM-PA factor. Item 15 exhibited strong and statistically significant cross-loading on the nontarget AFM-PA factor. The majority of items intended for the AFM-PA specific factor (except items 4 and 14) displayed strong and statistically significant factor loadings. The omega coefficient for AFM-Model 5 was .88, .43, and .72 for the total scale, AFM-PA, and AFM-NA, respectively. Item 2 had an exceptionally small R 2 -value (.17), which means that the model explains only 17% of the variance contained within the item. All other items presented with R 2 -values ranging from .23 to .67.
As far as could be established, this is the first study to apply the bifactor ESEM model to the AFM-2. The application of bifactor ESEM helps to clarify the factor structure of the AFM-2 further and provides a deeper understanding of the nature of the complex interaction and role of negative and positive human experiences in mental health functioning. Previous studies [30,35] report that bifactor models outperform and produce better fits when compared to traditional CFA and PCA analysis.
The findings of the present study are consistent with previous report by Wissing et al. [49], who applied CFA, PCA, and EFA to data from a sample of Setswana-speaking adults. The results are however inconsistent with previous research [46,48] that reported a one-factor solution for the AFM-2 when CFA was conducted. The findings, generally, suggest that the total score of the Twi version and the AFM-NA subscale can be interpreted and used in subsequent research in this population. However, caution should be applied when interpreting the AFM-PA subscale scores for this group, given the low reliability value it presented. Given the new inclusive approach of mental health (i.e., the dual-systems model) that takes into account both the negative and positive attributes of individuals, valid, reliable, and culturally-adapted mental health measures, such as the AFM-2, that evaluate both positive and negative dimensions of the human experience are needed. Although the AFM-2 has been used for research across various contexts, there is no evidence supporting its factor validity, including the original English version, or in any native language in the Ghanaian setting. Considering its potential usefulness for holistic mental health assessment, the validation of the Twi version of the AFM-2 could facilitate the conduct of empirical research and the evaluation of mental health interventions in the current group. The Twi version demonstrated satisfactory reliability for the total scale and the AFM-NA subscale, with adequate model fit and can therefore be for used in this group.

Study 2: Automatic Thoughts Questionnaire-Positive
Description of measure. Automatic thoughts questionnaire-positive (ATQ-P). The ATQ-P is a 30-item self-report scale that assesses the frequency of positive thoughts [50]. Respondents rate each item on a 5-point Likert scale according to how frequently each thought or a similar thought has occurred to them during the past week (1 = never, 3 = sometimes, 5 = all the time). There are four subscales (i) Daily functioning (ATQ-P-D), (ii) Self-evaluation (ATQ-P-S), (iii) Others Evaluation of Self (ATQ-P-O), and (iv) Future Expectations (ATQ-P-F). A total positive automatic thoughts score (ATQ-P-T) is calculated by adding the scores of all the items together. The ATQ-P-T demonstrated a high level of internal consistency with a Cronbach's alpha value of .96 among a sample of American adolescents [51]. A Cronbach's alpha value of .94 was also reported for the Dutch version in an adult Dutch population [52].
Previous studies provided support for three different models of the ATQ-P. Jolly et al. [51] reported a four orthogonal factor solution similar to the original factor structure hypothesized by Ingram et al. [50]. Bryant and Baxter [53] also reported a four-factor solution with factors comparable to the original hypothesis. However, contrary to the developers' interpretation, Bryant and Baxter postulated that the four factors are negatively correlated, rather than orthogonal. A decade later, Boelen [52] reported a five oblique factor model among a bereaved American adult sample. Boelen's model was a slight modification of the original four orthogonal factor model, but with an added factor, the Positive Social Functioning (ATQ-P-SF), drawn from items of the ATQ-P-D factor.
Overall, the four-factor 22-item bifactor ESEM (ATQ-P-Model 9) and the five-factor 22-item ESEM (ATQ-P-Model 12) models displayed superior model fit, with similar fit indices. Almost all items presented with strong and statistically significant loadings on their general target factors for both models. The ATQ-P-Model 9 presented with a very high reliability value for the total scale (.98) and satisfactory reliability values for the ATQ-P-D (.67), ATQ-P-S (.54), and the ATQ-P-O (.66) subscales, but not for the ATQ-P-F (.35) subscale. Similarly, the ATQ-P-Model 12 presented satisfactory reliability values for four subscales: .88 (ATQ-P-D), .87 (ATQ-P-S), .80 (ATQ-P-O), .86 (ATQ-P-F), but a very low value of .05 for the ATQ-P-SF. Since the four-factor 22-item bifactor ESEM model displayed a low reliability value for the ATQ-P-F subscale (.35 out of .50 for bifactor models) compared with the very low value (.05 out of .70 for non-bifactor models) for the ATQ-P-SF subscale of the five-factor 22-item ESEM model, the factor loading and R 2 statistics are presented for the four-factor 22-item bifactor ESEM model. For the four-factor 22-item bifactor ESEM model, four (items 11, 14, 20, and 29) of ten items of the ATQ-P-D factor, three (items 22, 23, and 28) of six items of the ATQ-P-S factor, and both items (items 3 and 4) of the ATQ-P-F factor displayed strong and statistically significant factor loadings on their target factors. None of the items intended to assess ATQ-P-O loaded significantly on their hypothesized target factor or demonstrated any significant cross-loadings on another specific factor. Only item 21 ("I'm happy with the way I look") displayed strong and statistically significant cross-loading on the nontarget ATQ-P-D factor. Notably, all items displayed high R 2 -values ranging from .39 to .83.
The results from the current study suggest that the items on the ATQ-P cluster into either four dimensions of positive thinking (individual's daily functioning, self-evaluation, others' evaluation of self, future expectations, and social relationships), or into five dimensions, with an additional dimension (positive social functioning) in the present sample. The reliability values in the current sample (e.g., omega coefficients between .80 and .88 for four of the five subscales of the ATQ-P-Model 12) are comparable to Boelen's [52] reported reliability values (.78 to .88), although the ATQ-P-PF subscale for ATQ-P-Model 12 presented with a very low reliability value (.05) for which it should not be interpreted or used to draw conclusions in the current sample.
With an increasing recognition that the human experience comprises negative and positive dimensions, the availability of a locally validated version of the ATQ-P scale could facilitate the study of the roles that positive thoughts play in people's psychological functioning [51][52][53]. For instance, it is well established that languishing (i.e., low or absence of positive mental health) is a predisposition to psychopathology [9,10]. A culturally-adapted version of the ATQ-P may also be useful in epidemiological surveillance to assess positive human experiences and to evaluate positive thinking patterns of patients under treatment. We found evidence for the factor validity for the Twi version of the ATQ-P in the current Ghanaian sample. However, on the basis of the low reliability scores of the ATQ-P-F and the ATQ-P-PF subscales, it is recommended that scores of these subscales should not be interpreted for the current group.

Study 3: Generalized Self-Efficacy Scale
Description of measure. Generalized self-efficacy scale (GSEs). The GSEs is a 10-item self-report measure that assesses optimistic self-beliefs to cope with a variety of difficult demands in life [54]. Responses to items are measured on a 4-point scale ranging from 1 (not true at all) to 4 (exactly true). The total score is calculated as the sum of all items and ranges between 10 and 40, with higher scores indicating more self-efficacy. Urban [55] reported a high reliability coefficient (α = .90) among a sample of multicultural university students comprising black Africans, Indians, and white South Africans. Löve et al. [56] also established evidence for the validity of GSEs, with a high Cronbach's alpha value of .90 in an adult Swedish sample.
Similar to the original hypothesis [54], a large body of research provides strong evidence that the GSEs represents a unidimensional construct [55][56][57], including a study that involved a large adult sample from 25 countries [58]. On the contrary, Zhou [59] reported a model with two distinct factors distinguished as coping self-efficacy and action self-efficacy among a sample of Chinese university students when EFA was applied.
Results and discussion. We first tested a one-factor CFA model (GSEs-Model 1A) as originally hypothesized [54], using the MLR estimator. The result displayed poor fit (CFI = .942; RMSEA = 0.093). Given that the scale uses four response options, we also applied the WLSMV estimator (GSEs-Model 1B). Results showed improved fit when the CFI and TLI indices were considered (CFI = .993; TLI = .991). However, the RMSEA displayed a large value of 0.137, indicating poor fit. The analysis revealed a high residual correlation (modification index [MI] = 41.8) between item 4 ("I am confident that I could deal efficiently with unexpected events") and 5 ("Thanks to  A possible explanation for the high correlation between items 4 and 5 is that, theoretically, both items appear to refer to an individual's appraisal of his or her ability to handle future, unforeseen circumstances and events. Our findings are consistent with previous research [55][56][57] that provided strong evidence that the GSEs is underpinned by a unidimensional measurement model. Findings from a multinational study involving 25 countries also demonstrated that the one-factor structure is adequate at both the individual-and national-level [58]. On the contrary, Zhou [59] reported a two-factor structure model (distinguished as coping self-efficacy and action self-efficacy) for a sample of Chinese university students.
The present study is one of the few to provide evidence for the factor validity of the Twitranslated version of the GSEs among a nonclinical sample, and specifically, in an adult sample from a rural and resource-limited collectivistic community setting. Findings of the current study show that the factor structure and reliability value of the Twi version of the GSEs scale are generally consistent with previous studies, including those conducted among clinical population [56,57]. Given its favourable reliability and acceptable model fit, the Twi version of the GSEs is recommended for use in the current sample to evaluate individuals' optimistic selfbeliefs.

Study 4: Patient Health Questionnaire-9
Description of measure. Patient health questionnaire-9 (PHQ-9). The PHQ-9 consists of nine items designed to assess the nine DSM-IV diagnostic criteria for depression [60]. Items are rated on a 4-point Likert scale specified as 0 (never), 1 (several days), 2 (more than half of the days), and 3 (most days). Total scores of 5, 10, 15, and 20 respectively represent mild, moderate, moderately severe, and severe depression [60]. The PHQ-9 was found to be valid with a Cronbach's alpha value of .89 in a group of primary care patients previously diagnosed with depression [60]. Adewuya et al. [61] also reported a Cronbach's alpha value of .85 in a Nigerian university student population. the PHQ-9-S subscale, and a modest reliability value of .69 for the and PHQ-9-N subscale. It is noted that both models (PHQ-Model 5A and PHQ-Model 10A) presented similar fit indices.
Given that the two-factor ESEM model (PHQ-Model 10A) had larger reliability values for the subscales compared to PHQ-9 Model 5A (.76 and .69 vs .78 and .48), its factor loadings and related statistics are presented. Four of six items (items 6, 7, 8, and 9) and two of three items (items 3 and 4) displayed strong and statistically significant loadings on their intended cognitive-affective and somatic factors, respectively. Notably, item 5 ("Poor appetite or overeating") presented a strong and statistically significant cross-loading on the nontarget cognitive-affective factor. Similarly, item 1 ("Little interest/pleasure in doing things") and 2 ("Feeling down/depressed/hopeless") displayed a strong and statistically significant cross-loading on the nontarget somatic factor. Items 1 and 2 presented exceptionally small R 2 -values of . 16 and .29 respectively, suggesting that the model explained only 16% and 29% of the variance contained within the item 1 and 2, respectively. All other items presented with adequate R 2 -values (. 47-.74).
The findings of the present study are consistent with previous research [63,65,66], although some other studies [67,68] also reported a unidimensional structure. For the current sample, the PHQ-9 can be subdivided into somatic (PHQ-9-S) and nonsomatic/affective (PHQ-9-N) factors, which is comparable to previous reports in clinical [63][64][65] and nonclinical [62,66] samples. We noted that items 7 and 8 loaded on the somatic factor in the model suggested by Peterson et al. [65], and instead, on the nonsomatic/affective factor in the model recommended by Krause et al. [64] and Subotić et al. [66]. A possible explanation for the behavior of these items could be that although both items (Item 7: "Trouble concentrating"; and Item 8: "Being so fidgety or restless") are behavioristic in nature, they could be driven and reinforced by some emotions [69]. The reliability values found in the current sample (.76 and .69) are comparable to previous report among a sample of university students in Nigeria [70] and in the United States [71].
Although the PHQ-9 is a widely validated tool, the current study is among the first to apply a bifactor analysis to test its structure in the African context. In spite of the recent increase in the use of the PHQ-9 in epidemiological surveillance in Ghana, only one study [72] has examined the psychometric properties of the original English version of the PHQ-9 among a sample of Senior High School students in Ghana. The recent advocacy for the integration of mental health into primary health care to allow for comprehensive assessment and treatment of mental disorders, particularly in LAMIC [73], necessitates valid and reliable measures that are also culturally sensitive and has the ability to detect the presence of mental illness in nonclinical populations. Overall, our findings suggest that depression should be understood, for the current sample, as characterized by somatic and nonsomatic or affective structure, rather than as an expression of unidimensional structure. In the present study, the Twi version of the PHQ-9 showed adequate reliability and model fit and can be administered to evaluate self-reported levels of depression in the current sample. Caution should however be applied when interpreting the nonsomatic factor scores for this population, considering the low reliability score it presented.

Study 5: Satisfaction with Life Scale
Description of measure. Satisfaction with life scale (SWLS). Designed to assess a person's global judgment of life satisfaction, the 5-item SWLS theoretically evaluates an individual's life circumstances in comparison to his or her standards [74]. Responses are on a 7-point Likert scale, where 1 = strongly disagree and 7 = strongly agree. Total scores range from 5 to 35, where higher scores (31)(32)(33)(34)(35) indicate greater life satisfaction. Wissing and van Eeden [75] provided evidence for the validity of the SWLS with modest reliability (α = .67) among a multicultural sample of South African adults. Diener et al. [74] reported a one-factor solution for the SWLS in the original validation study. This has since been replicated across several populations and contexts [49,75] when CFA, EFA, and PCA were applied.
Results and discussion. First, we evaluated the one-factor CFA model (SWLS-Model 1) as originally hypothesized [74]. This model resulted in a poor fit (CFI = .941; RMSEA = 0.138). An MI value of 33.9 suggested a high residual correlation between item 4 ("So far I have gotten the important things I want in life") and 5 ("If I could live my life over, I would change almost nothing"). Consequently, a one-factor CFA model with the residual correlations between items 4 and 5 specified (SWLS-Model 2) was examined. The SWLS-Model 2 (CFI = .988; RMSEA = 0.069) presented superior fit and outperformed SWLS-Model 1. An omega coefficient of .87 was found for SWLS-Model 2. All five items presented strong and statistically significant factor loadings. Of note, item 5 presented with a comparatively low, but statistically significant factor loading (.51). In this model, item 5 displayed a low R 2 -value (.26), indicating that only 26% of the variance contained within the item is explained by the model.
In line with previous studies, a one-factor CFA model (residuals of items 4 and 5 correlated) demonstrated superior model fit. This finding is consistent with previous validation studies that also reported a one-factor CFA model for adult [49,75] and adolescent [76] samples. We found a high modification index between items 4 and 5, similar to previous reports [77]. A possible explanation for this correlation is that both items refer to an individual's present state of satisfaction with life with respect to the past. To the contrary, items 1-3 pertain to satisfaction with present life. The analysis revealed, similar to findings from previous studies [49,76], that item 5 presented a relatively low, but statistically significant factor loading. In the present study, the reliability value for the SWLS was .87.
The findings from the current study provide additional evidence in support of previous studies confirming the single-factor structure of the SWLS [49,[74][75][76]. For the current sample, the findings suggest that the scale has satisfactory reliability and sufficient model fit similar to the results from the original scale development study [74] and can be administered to evaluate the level of satisfaction with life in this group.

General discussions. strengths, limitations, and future directions
This paper set out to explore the evidence for the factorial validity of the Twi-translated versions of five mental health and well-being measurement instruments, namely the AFM-2, ATQ-P, GSEs, PHQ-9, and the SWLS, with data from a sample of adult residents of four rural poor communities in Ghana. The findings in this study provide evidence that the Twi versions of all five mental health and well-being scales, particularly the total scales, are valid and reliable with sufficient model fits and can be used in basic research and interventional studies in the current sample. The major strengths of the present study include the application of both traditional and recently developed models and techniques in the analyses of the data. An additional strength of this study is the target population. Unlike previous studies, the sample in the current study is composed of rural adults who were also non-English speakers. Whereas this study was a first attempt to establish the evidence for the validity of the Twi versions of these five measures of mental health and well-being in the Ghanaian context, more of such validation studies are needed to provide researchers and practitioners with valid and reliable tools for research and to evaluate the effectiveness of mental health interventions in the target population. Such efforts would also broaden our understanding of how these constructs manifest in the current, understudied context. Although this study contributes to the literature, as the first, to the best of our knowledge, to provide information on the reliability and the factor structure of the Twi versions of these selected mental health and well-being measures in the Ghanaian setting, we acknowledge the following limitations. Firstly, our participants, on average, were middle-aged, had little or no formal education, and were economically disadvantaged. These unique characteristics of our sample may limit the generalisability of the findings. Secondly, the current sample was mostly non-English speakers and comprises community members who were largely of the Akan heritage-who can also be described as a more collectivistic group. Future studies should include English-speaking samples from urban settings for a comparative analysis of the results. Furthermore, given the cultural diversity in Ghana, it would be important for future studies to include samples from different regions with different cultural heritages and orientations for a better representation. Lastly, while our sample size was adequate to perform the intended analyses, it was too small to perform measurement invariance analyses. We recommend that future studies should include larger sample sizes to permit measurement invariance analysis and thereby provide additional psychometric information of the measurement instruments involved. We wish to emphasize, however, that, the main purpose of this study was to provide evidence for the factorial validity of the selected measurement instruments. We recommend that future studies should investigate other forms of validity, such as criterion-related validity and apply other analyses such as test-retest reliability to further examine the psychometric properties of the measurement instruments, perform in-depth item-analysis for the individual scales.

Conclusions
The availability of reliable and valid measures, such as the present Twi-translated versions of the AFM-2, ATQ-P, GSEs, PHQ-9, and the SWLS, can be valuable resource for mental health researchers and practitioners working to promote well-being and positive mental health in one of the most spoken Ghanaian languages in the context of the current sample. The results in this study showed that the total scales of all five scales and some subscales attained satisfactory reliability values with acceptable model fit, but that further research is necessary to explore some problematic items, especially in the case of the AFM-PA and ATQ-P-F subscales. It is our hope that the availability of these Twi translated versions of these five mental health and well-being measures will afford researchers the opportunity to the conceptualize and evaluate diverse array of mental health, that is, positive mental health and mental ill(health) in this Ghanaian context. Additionally, these translated measuring instruments could serve as a valuable resource for researchers to facilitate the evaluation of the effects of mental health intervention programs in the current sample.