Validation of the Refugee Health Screener-15 for the assessment of perinatal depression among Karen and Burmese women on the Thai-Myanmar border

Perinatal depression is common, and left untreated can have significant and long-lasting consequences for women, their children and their families. Migrant women are at particular risk of perinatal depression as a result of a multitude of stressors experienced before, during and after migration. Identification of perinatal depression among migrant women—particularly those living in low- and middle-income regions—remains challenging, partly due to the lack of locally-validated and culturally appropriate screens tools. This study formally validates Burmese and Sgaw Karen versions of the Refugee Health Screener-15 (RHS-15) as a screening tool for perinatal depression among migrant women living on the Thai-Myanmar border. The Structured Clinical Interview for the Diagnosis of DSM-IV Disorders (SCID) was used as the gold-standard comparator. Complete results were obtained for 235 Burmese-speaking and 275 Sgaw Karen-speaking women. Despite displaying reasonable psychometric properties, a number of shortcomings associated with the RHS-15 limited its utility in this setting. The Likert-type response categories of the RHS-15 proved problematic in this low-literacy population. Combined with the relative superiority and greater ease of administration of the SCID, the RHS-15 is not recommended as the tool of choice for detecting perinatal depression in this setting.


Introduction
Maternal mental health is an important global health challenge and a cornerstone to reducing global inequalities. Perinatal mental disorders are common, and the global burden falls disproportionately upon low-and middle-income countries (LMIC), where an estimated 16% of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 pregnant and post-partum women experience depression [1][2][3]. Untreated perinatal depression can have long-lasting consequences for women, and children of depressed mothers are at risk of physical, behavioral and emotional impairments not only in infancy but throughout early childhood, potentially persisting into adolescence [3][4][5].
Migrant women face a multitude of stressors which place them at increased risk of mental disorders [6][7][8]. Risk factors work at the individual, family, community and wider societal level and include marginalized status, exposure to traumatic events including sexual and domestic violence, adverse socio-economic circumstances, language barriers and lack of social support in destination countries and poor access to health services. Migrant women who are low-skilled and have resettled within LMIC may constitute a particularly vulnerable subgroup, facing additional challenges such as limited literacy and more severe socio-economic deprivations for themselves and their children [9].
Better identification of affected individuals is key to promoting mental health. Culturally valid tools are essential for detecting illness, quantifying the disease burden, targeting care and monitoring treatment response [10]. For many women, pregnancy is a time of increased contact with health providers and thus offers a valuable screening opportunity. Screening tools allow rapid assessments to be made by non-specialists-an important consideration in LMIC settings where mental health services are commonly lacking or over-stretched. As psychometric properties vary across settings, screening tools must be validated locally prior to use [10]. In addition, screening tools must be considered acceptable and easy to use by the local population in order to be implemented [11].
A previous study of over 600 migrant women conducted on the Thailand-Myanmar border between February 2014 and April 2015 found that the Edinburgh Postnatal Depression Scale (EPDS), one of the most widely-used screening tools for perinatal depression, had poor acceptability among local staff and patients [12]. Women in this deprived and low-literacy setting found the EPDS language difficult, were unfamiliar with Likert-type response categories and found the subtleties between response categories challenging [12]. A number of women were unable to complete the questionnaire and several became distressed. It was necessary, therefore, to find an alternative tool to identify women at risk of depression in this population. The current validation study formed part of a larger study of perinatal mental health on the Thai-Myanmar border [13] and aimed to determine the validity and acceptability of Sgaw Karen and Burmese language versions of the Refugee Health Screener (RHS-15) in this setting. In this paper, the term 'migrant' is used to describe any person who has moved from their habitual place of residence, regardless of the circumstances [14]. In this setting, migrant populations include both labour migrants and refugees.

Setting
The Thailand-Myanmar border is home to an estimated 200,000 labour migrants and 145,000 refugees fleeing decades of conflict, poverty and lack of opportunity in Myanmar [15,16]. Labour migrants live in villages on both sides of the border working in agriculture, manufacturing and the service industry in an area of widespread socioeconomic deprivation. Access to healthcare and education for the majority of labour migrants and their families is limited by their undocumented status. Those that have been granted refugee status live in established camps on the Thai side of the border, the largest of which is Maela with a population of 38,000. Refugees have greater access to healthcare, education and housing as compared with the other migrant groups because non-governmental organisations (NGOs) recognized by the United Nations High Commissioner for Refugees (UNHCR) and the Thailand Ministry of Interior, work within the camps to provide these services. Migrants in this region constitute a heterogeneous group of Karen, Burman and Burman Muslim ethnicities, each with their own languages, traditions and religious backgrounds. Sgaw Karen, the language of one of Myanmar's largest ethnic group, is the most commonly spoken language. A smaller proportion speaks Burmese, the official language of Myanmar. The Shoklo Malaria Research Unit (SMRU) has provided maternity care to this border population since 1986. In collaboration with Thailand Public Health, care was initially motivated by the very high rates of maternal mortality due to malaria which were reduced by the antenatal clinic (ANC) services [17]. ANC for rural labour migrant women are located 30-60km north and south of the border town of Mae Sot, Tak Province, at Wang Pha (WPA) and Mawker Tai (MKT) and for refugee women in Maela (MLA) (Fig 1).

Participants
Participants were first trimester pregnant migrant women attending SMRU antenatal clinics (ANC) at MLA, MKT and WPA. Women were eligible if they were aged 18 years or over, their estimated gestational age (EGA) as determined by ultrasound dating scan was less than 14 weeks, they had a viable pregnancy, planned to deliver at SMRU and were willing and able to participate.

Ethics
Ethics approval was granted by the University of Oxford Tropical Research Ethics Committee (OxTREC 28-15), Mahidol University Faculty of Tropical Medicine Ethics Committee (TMEC 15-045) and the Tak Border Community Advisory Board (T-CAB 6/2/2015), a committee of local community representatives who assess the acceptability of proposed research [18].

Instruments
The Refugee Health Screener-15 (RHS-15) is a fifteen-item screen for symptoms of depression, anxiety and post-traumatic stress disorder (PTSD) developed in conjunction with refugees from Myanmar, Bhutan and Iraq recently resettled in the United States [19,20]. Items 1-14 ask respondents to rate the frequency of psychological and somatic symptoms on a 5-point Likert scale scored 0 ('not at all') to 4 ('extremely') and diagrammatically annotated with a beaker filled to varying degrees. Item fifteen is a distress thermometer (DT) which asks respondents to rate their level of distress from 0 ('no distress') to 10 ('extreme distress'). A total score !12 on items 1-14 and/or a score of !5 on the DT are considered to be a positive screen requiring further assessment. Burmese and Sgaw Karen translations were obtained from the RHS-15 authors [20]. The RHS-15 has not previously been used in Burmese or Karen communities outside of the United States, and there was therefore a need to validate the questionnaire for our study population. The RHS-15 is freely available to researchers and clinicians upon request from Pathways to Wellness, rendering it an affordable and sustainable tool to use in low-resource settings [20].
We used the Structured Clinical Interview for the Diagnosis of DSM-IV Disorders (SCID) as a diagnostic tool against which to validate the RHS-15 [21]. The ten items relating to depression including low mood, anhedonia, changes in appetite and sleeping, restlessness, energy levels, worthlessness, recurrent thoughts of death, and coping with everyday tasks were selected [21]. Items were translated from English into Burmese and Sgaw Karen by two senior midwives experienced in conducting clinical work and research in the local population and fluent in English, Burmese and Sgaw Karen. Back-translations were conducted by a senior midwife and a physician who had not seen the original English version. Original and back-translated versions were assessed by an English-speaking physician who confirmed semantic equivalence between the two versions. Due to the long-term scarcity of mental health infrastructure in this resource-constrained region, no psychiatrist was available to conduct the translations [22][23][24]. We used the Diagnostic and Statistical Manual of Mental Disorders, 4 th edition (DSM-IV) criteria to diagnose major and minor depressive disorder [25]. To account for pregnancy status, scoring criteria for sleep and appetite items were considered positive only if they were unrelated to pregnancy status: for example, poor appetite was coded positive if caused by feelings of sadness, and negative if caused by morning sickness.

Staff and training
The study team consisted of an English-speaking physician (GF), four counsellors and two senior midwives. The physician underwent training by the American Psychiatry Association in conducting the SCID prior to the study. Counsellors and midwives were fluent in Burmese, Sgaw Karen and English. SMRU counsellors and midwives are experienced in working with the local population and are themselves members of the migrant community. Study staff received training in counselling methods and in administering the RHS-15 and SCID. The study was rolled out stepwise across the three sites, enabling the physician to be present for all interviews during the first weeks of enrolment at each site. The physician continued to be available to all sites throughout the study. explained that participation was voluntary, that non-participation would not affect care and that consent could be withdrawn at any time. Women were able to ask questions before deciding to participate. Those who agreed provided written informed consent by signature or thumbprint. Recruitment ran from October 2015 to April 2016.

Procedure
Questionnaires were administered by study staff in a private room in Sgaw Karen or Burmese according to women's preference. Participants first completed a brief demographic questionnaire. The RHS-15 was administered verbally by staff reading the items to participants and then recording responses. This method of verbal administration was used due to low literacy rates within this population and limited comprehension of health-related written information, even among those able to read [26]. Verbal administration is considered acceptable by RHS-15 authors and has been used previously in this setting and elsewhere [19,27]. After the RHS-15, women completed the SCID. At MKT, the RHS-15 and SCID were conducted by different members of the study team, each of whom was blinded to the results of the other. At MLA and WPA, the RHS-15 and SCID were administered consecutively by the same study team member due to staffing constraints.
All SCID responses were independently scored by the study physician (GF) and an independent physician. Scoring involved using the DSM-IV criteria to give each participant a diagnosis of depression. We included minor as well as major depressive disorder. This decision was based on clinical judgment, as we found at an early stage in the study that the majority of women with minor depressive disorder had symptoms severe enough to warrant being offered treatment. Clinically, these individuals were managed in the same way as those with major depressive disorder. We therefore felt that the combined categories of major and minor depressive disorders provided a more accurate reflection of depression in this setting. Discrepancies in diagnoses were resolved by discussion with a psychiatrist (MF). Women with depression were offered counselling and, when appropriate, anti-depressant medication and followed-up by a physician. Women with severe symptoms or active suicidal ideation were admitted for observation.

Sample size
We aimed to recruit a total sample of 200 Sgaw Karen-speaking and 200 Burmese-speaking women. This target sample size was based on validation study guidelines suggesting that 200 participants is considered fair [28].

Statistical analysis
We estimated the proportion (95% confidence interval) of women who met SCID criteria for major or minor depression. Descriptive statistics of demographic characteristics of Burmesespeaking and Sgaw Karen-speaking women, and of RHS-15 outcomes for with and without depression, were presented and compared using Chi 2 tests for categorical variables and Mann-Whitney U tests for continuous, non-normally distributed variables. We expected mean RHS-15 scores to be significantly higher in depressed compared to non-depressed women: this "known groups" method supports the construct validity of the RHS-15 [29]. We planned to explore possible assessment bias by comparing the proportion of women with depression according to blinding status of SCID interviewers to RHS-15 scores.
To assess the validity of the RHS we calculated the sensitivity, specificity, positive likelihood ratio (LR+) and negative likelihood ratio (LR-) at each cut-off value of the RHS-15. The LR + represents the probability of participants with depression scoring positive, divided by the probability of participants without depression scoring positive. The higher the value, the more convincingly the RHS-15 score is able to detect depression. Analyses were conducted separately for each language. We determined reliability of the RHS-15 using Cronbach's alpha and used Youden's Index, the value at which [(sensitivity + specificity)-1] is maximised, to identify the point of optimal balance between sensitivity and specificity. We conducted a Receiver Operating Characteristic (ROC) analysis to calculate the area under the curve (AUC) for the RHS-15 and thereby assess criterion accuracy (the proportion of results correctly identified) and validity [30]. We conducted separate analyses for items 1-14 and the distress thermometer (item 15). Statistical analyses were conducted using STATA/IC version 14.1 [31].

Baseline characteristics
Of 630 eligible women who attended ANC during the study period, 569 (90.2%) participated (Fig 2). Women who were eligible but did not participate did not differ significantly from those who participated in terms of age, ethnicity or migrant status. Complete SCID and RHS-15 results were obtained for 235 Burmese-speaking and 275 Sgaw Karen-speaking women. Baseline characteristics of these 510 women included in the current analysis are summarized in Table 1. The prevalence of depression of any severity as diagnosed by the SCID was 7.7% (39/510). Overall, 45.1% (230/510) participants screened positive on the RHS-15 using  recommended cut-offs. There were no significant differences between Burmese-and Sgaw Karen-speaking women for depression status or RHS-15 scores. RHS-15 results for women with and without depression are shown in Table 2. Median scores and the proportion of participants scoring positive on the RHS-15 were significantly higher among women with depression compared to women without depression. The prevalence of depression at MKT, where SCID interviewers were blinded to the outcomes of the RHS-15, was lower than at WPA and MLA, where there was no blinding of interviewers (3.3% at MKT vs. 9.5% at WPA and MLA). Because only one site had blinding, and this site also had a lower depression prevalence than the other sites, it was not possible to statistically separate the effects of site and blinding.

Psychometric properties of the Burmese RHS-15
The reliability of the Burmese RHS-15 as determined using Cronbach's alpha was 0.63. Omitting unhelpful individual items of the RHS-15 did not improve the reliability significantly producing a range of alpha values between 0.56 and 0.66. The sensitivity, specificity and likelihood ratios for each cut-off of items 1-14 of the Burmese RHS-15 are shown in S1 Table. The ROC curve for items 1-14 is shown in Fig 3. The area under the curve was 0.84 (95% CI 0.76-0.93). Youden's index for items 1-14 was maximized (0.58) at a cut-off of !14, suggesting that this is

Psychometric properties of the Sgaw Karen RHS-15
The reliability of the Sgaw Karen RHS-15 as determined using Cronbach's alpha was 0.56, with values ranging between 0.48 and 0.52 when individual items were omitted. The sensitivity,

Discussion
This study examines the validity of the Burmese and Sgaw Karen RHS-15 as a screening tool for perinatal depression in women on the Thai-Myanmar border. The combined prevalence of major and minor depression during the first trimester of pregnancy as assessed using the SCID was 7.7%. This is line with estimates reported in a systematic review of refugees resettled in Western countries, which found a pooled prevalence of major depression of 5% [32]. Overall, the RHS-15 performed adequately, displaying reasonable sensitivity and specificity in both Burmese and Sgaw Karen languages. However, there were also a number of shortcomings which limited the utility of the RHS-15 in this setting. On the Burmese RHS-15, the optimal cut-off for items 1-14 was !14, yielding good sensitivity and specificity at 81.8% and 76.1%, respectively. The distress thermometer in Burmese performed slightly less well with a sensitivity of 77.3% and specificity of 67.6% at the optimal cut-off of !4. On the Sgaw Karen RHS-15, the optimal cut-off for items 1-14 was !15, yielding good sensitivity and specificity at 88.2% and 81.0%, respectively. The distress thermometer in Sgaw Karen performed significantly less well: at the optimal cut-off of !4 sensitivity and specificity were low at 52.9% and 69.2%, respectively. Reliability as determined by Cronbach's alpha was low in both languages (0.63 in Burmese; 0.56 in Sgaw Karen). However, as the RHS-15 assesses for anxiety and post-traumatic stress disorder as well as depression, the low alpha values may reflect the multiple dimensions of the test rather than poor internal consistency of the scale [33].
In the original validation of the RHS-15 among refugees in Washington State, USA, Hollifield et al. found significantly higher sensitivity (100%), specificity (91%) and reliability (Cronbach's alpha 0.92) among Burmese participants [19]. A number of explanations are possible for these differences. Firstly, the current sample of pregnant migrant women in a LMIC differs considerably from the original study's sample of men and women refugees in the USA. Given that only a small proportion of migrant populations are selected for formal resettlement programmes, the US sample is likely to differ significantly and systematically from general migrant populations prior to resettlement. Secondly, Hollifield et al.'s small sample of 50 Burmese participants, of whom only six had a diagnosis of depression, suggests that results must be interpreted with caution. Furthermore Hollifield et al.'s group of 'Burmese' participants in fact include four distinct ethnicities (Burmese, Karen, Chin and Karenni) and two interview languages (Burmese and Sgaw Karen). Not disaggregating results by language or ethnicity may have masked differences between groups.
A number of important contextualising factors need to be considered when interpreting findings. Firstly, factors relating to language and literacy are likely to have influenced results. Participants found it difficult to select the appropriate Likert-type response category on the RHS-15, and it was often necessary to discuss at length the distinctions between 'a little bit', 'moderately' and 'quite a bit'-none of which have a direct translation into Burmese or Sgaw Karen. Previous studies have described the challenges of using Likert scales in low-literacy populations [34][35][36]. The literacy rate in our sample was 69%, but this was based on participants' self-reported answers to the question, "Are you able to read and write fluently?". A previous study in this population assessed literacy objectively by setting participants a short reading test, resulting in lower rates of 47% [26]. The previous study found that not all women who reported being able to read well were in fact able to complete the reading test fluently, and it therefore likely that our self-reported figure of 69% is an over-estimation of current literacy rates [26]. An inability to read has been associated with a loss of discriminant power of fivecategory response scales such as the RHS-15 scale, and it has been suggested that simpler response scales such as a three-point scale are more reliable in low-literacy populations [34].
Even among those able to read fluently, health literacy-the ability to process and understand health information and make appropriate health decisions-has generally been low on the Thai-Myanmar border setting [37]. It is unclear why the distress thermometer, despite being one of the most readily understood items of the RHS-15, was a poorer predictor of depression status than the more difficult-to-answer items 1-14.
A further constraint may have been the linguistic characteristics of Burmese and Sgaw Karen. As an ethnic language with a strong oral tradition, Sgaw Karen in particular has a limited scope for nuanced questioning and linguistic precision, especially around Western constructs of mental distress and its symptoms. The absence of a Burmese or Sgaw Karen word for depression means that alternative words with similar meanings are needed to convey its meaning. Fig 7 illustrates graphically how a relatively short sentence in English becomes lengthened following translation into Burmese and Sgaw Karen, highlighting the complexity involved in conveying some of the SCID and RHS-15 questions.
A further important consideration arose from the relative superiority of the comparator instrument. It became apparent during the study that in this particular setting, the SCID had a number of strengths over the RHS-15. The SCID proved not only more straight-forward to administer, but also elicited a greater level of detail and depth than the RHS-15. The open questions format of the SCID allowed women to describe their feelings in their own words, and often, valuable additional information around their personal and social circumstances emerged which helped to contextualise their psychological state. The SCID therefore added a breadth and depth to the clinical history that far exceeded what was elicited by the RHS-15. Importantly, on average, the SCID saved time. For women with no symptoms of depression, the SCID took less time to administer than the RHS-15. For women with symptoms, the SCID took longer to administer than the RHS-15, butremained faster than the alternative of administering the RHS-15 first followed by the SCID, as would be required by a screen-positive RHS-15.
Added to these two contextual factors came the fact that despite good sensitivity and specificity of certain aspects of the RHS-15, a number of its properties were sub-optimal. The fact that overall, 44% (225/510) of women screened positive on the RHS-15 using the optimal thresholds for our population is problematic given that each of these screen-positive women Validation of the Refugee Health Screener on the Thai-Myanmar border requires further assessment in the form of a diagnostic interview. Follow-up of almost one in every two women attending ANC is unfeasible in a resource-constrained setting such as ours with numerous competing health priorities. This limitation of the RHS-15, along with women's difficulty with the Likert-type scale and the relative superiority of the SCID, led to the decision not to continue using the RHS-15 in our setting.
The greatest strength of our study is that, to our knowledge, it is the first to use the RHS-15 in a LMIC setting. The majority of today's migration flows occur within LMIC but despite this, research on migrants in LMIC is severely lacking [9]. Understanding and addressing the needs of this group is paramount, particularly as their experiences and circumstances are likely to differ significantly from migrants who resettle in higher-income settings, who represent a minority of the total population of migrants globally. A further strength is our sample size of 510 which is impressive given the high rates of mobility in this population. Our sample represented 81% of all eligible women. This high response rate suggests that our results are representative of the wider migrant community served by SMRU.
There were a number of limitations to our study. Firstly, gold standard assessments of depression were made by non-specialists. The SCID was designed to be administered by a clinician or trained mental health professional, rather than by minimally-trained healthcare workers. The lack of mental health expertise in our setting meant that specialist involvement was not an option available to us, and this is likely to be the case in other LMIC settings [23,24]. However, we maximised accuracy by using trained clinicians with extensive experience of working within this community. Arguably, local staff might in fact be better placed to understand and respond to culturally-specific presentations of mental disorders than less locallyexperienced experts in mental health [11]. In addition, it would neither be desirable nor sustainable in our setting for common mental disorders to be diagnosed and managed by specialists [11]. Engaging frontline, local staff to conduct this study provided experience and expertise within the community and ensured that screening procedures and management patterns would be sustainable in the long-run. It is important to note that although the SCID served as a useful tool in our setting, the high levels of training ideally required to achieve optimal SCID results is likely to be unfeasible in many other resource-poor settings. The applicability and appropriateness of different tools for the assessment of depression across cultures and settings is highly context-specific, and findings of what worked in our particular setting may not necessary be applicable elsewhere.
A second limitation was that due to staffing constraints, SCID assessors were not blinded to RHS-15 scores at two out of the three study sites. Because blinding was only possible at one site, and because this site also had a lower prevalence of depression, it was not possible to explore whether there was an association with blinding status and depression. We are therefore unable to say whether or not blinding had a significant effect on the diagnosis of depression using the SCID. Thirdly, the fact that the RHS-15 was verbally-administered by interviewers rather than self-completed may have affected results. Face-to-face, interviewer-administered questions may be prone to social desirability bias and a lower willingness to disclose sensitive information [38]. However, this method, which places the least possible burden on participants, was necessary in our low-literacy setting [38]. There may in fact have been advantages to our approach: a friendly and sensitive interviewer can encourage the disclosure of information as well as provide or seek clarification when necessary [39], and the fact that interviewers were local staff who are themselves part of the migrant community may have helped to establish trust. During the consent process, we ensured women understood that any information they disclosed would be confidential.
Finally, the gold standard SCID has not itself been validated in this setting. The SCID remains one of the most widely used diagnostic instruments globally and was selected based on the lack of any alternative, locally-developed tools. The possibility that culturally-specific symptoms of depression were missed cannot be ruled out. Somatic symptoms, for example, are common in non-Western cultures but may not have been picked up by the SCID which focuses on psychological symptoms [1,40]. In a previous study exploring pregnant women's perceptions of mental illness on the Thai-Myanmar border, women included a 'heavy head', tingling and numbness as characteristics of depression [41]. Women who presented with only these symptoms may not have met the DSM-IV criteria and this could have led to an underestimation of perinatal depression prevalence. However, the open questions format allowed an array of symptoms to be volunteered by participants, and these were followed-up by staff accordingly.

Conclusion
As is typical of other resource-constrained settings, mental health services are severely lacking on the Thai-Myanmar border. Mental disorders during pregnancy and the post-partum period have immediate and long-term consequences for mothers, children and wider society. The lack of appropriate, locally validated screening tools limit the ability to identify and monitor disorders such as perinatal depression, perpetuating its low priority on the global agenda for marginalised and vulnerable populations. Although the RHS-15 demonstrated good sensitivity and specificity on the Thai-Myanmar border, the SCID was able to elicit more detailed and culturally-relevant information in the same amount of time, leading us to choose the SCID as the tool of choice in our setting. Further research is required on the role of health literacy in mental health screening. With the majority of global migration flows occurring within LMIC regions, research from these regions is an urgent research priority. As long as the needs of the most vulnerable communities remain under-researched, these needs will also be remain insufficiently addressed.