The long-term impact of community mobilisation through participatory women's groups on women's agency in the household: A follow-up study to the Makwanpur trial

Women’s groups practicing participatory learning and action (PLA) in rural areas have been shown to improve maternal and newborn survival in low-income countries, but the pathways from intervention to impact remain unclear. We assessed the long-term impact of a PLA intervention in rural Nepal on women’s agency in the household. In 2014, we conducted a follow-up study to a cluster randomised controlled trial on the impact of PLA women’s groups from 2001–2003. Agency was measured using the Relative Autonomy Index (RAI) and its subdomains. Multi-level regression analyses were performed adjusting for baseline socio-demographic characteristics. We additionally adjusted for potential exposure to subsequent PLA groups based on women’s pregnancy status and conduct of PLA groups in areas of residence. Sensitivity analyses were performed using two alternative measures of agency. We analysed outcomes for 4030 mothers (66% of the cohort) who survived and were recruited to follow-up at mean age 39.6 years. Across a wide range of model specifications, we found no association between exposure to the original PLA intervention with women’s agency in the household approximately 11.5 years later. Subsequent exposure to PLA groups was not associated with greater agency in the household at follow-up, but some specifications found evidence for reduced agency. Household agency may be a prerequisite for actualising the benefits of PLA groups rather than a consequence.


Introduction
Women's groups practising participatory learning and action (PLA) in rural areas have been shown to improve maternal and newborn survival in low-income countries [1]. Inspired by the philosophy of "popular education" by Paulo Freire [2], PLA interventions employ a trained facilitator to hold regular community meetings in which groups of local women are led PLOS  through a cycle of problem identification, prioritisation, action planning, strategy implementation and outcome evaluation. The implementation and scale-up of PLA strategies to improve newborn mortality and health is explicitly endorsed in a recommendation by the World Health Organisation [3]. Despite strong evidence to support efficacy, the pathways from intervention to impact remain unclear and the evidence on how community-level exposure to PLA women's groups achieves its impact is evolving [4,5]. Victora [5] commented on the long causal chain from 'critical consciousness to mortality' and compared the relative lack of theory on how PLA women's groups currently work to the situation of 19th century epidemiologists before the discovery of germ theory-These had clear evidence for an association between poor sanitation and incidence of infectious disease, but a poor understanding of the biological mechanism.
Intermediary effects have been explored through qualitative studies, which have reported that women and men participating in the groups engaged in several collective action strategies from setting up mother and child health funds to producing and selling their own clean delivery kits. Women's group members ascribed to attendance increased self-confidence and self-esteem and improved social support from the community [6][7][8][9][10][11]. However, almost no qualitative or quantitative evidence exists on the impact of PLA groups on women's agency in the household.
Yet, agency in the household may be a key barrier to improved maternal, child and reproductive health. The United Nations Global Strategy for Women's, Children's and Adolescents' Health 2016-2030 recognises that "women, children and adolescents are potentially the most powerful agents for improving their own health and achieving prosperous and sustainable societies" [12]. At the same time, the 5th Sustainable Development Goal explicitly sets out to achieve gender equality and empower women and girls by 2030 [13].
Although no consensus currently exists regarding a single definition of empowerment and agency [14,15], a considerable body of literature discusses the range of possible meanings [14,[16][17][18][19][20][21]. We follow Ibrahim and Alkire [19] in defining empowerment as "the expansion of agency" and Sen [22] in defining a person's agency as "what the person is free to do and achieve in pursuit of whatever goals or values he or she regards as important" (p.203). Agency relates here to the extent to which individuals are able to provide reasons for their actions that accord with their own goals or values [22,23]. "Household agency" refers to individuals' ability to pursue and achieve valued goals within the context of their personal relationships with members of their own household.
PLA groups might enable women to tackle structural obstacles to health such as low status in their own family, or household restrictions on the freedom of women to move and speak as they please [24,25]. In turn, lack of household agency has been linked to poorer fertility [26], child nutrition [27], health-seeking [28], an maternal and child health [28] outcomes in multiple contexts. One before-and-after analysis of a PLA intervention in rural Bangladesh found evidence for increased participation in household decision-making on women's own healthcare [29]. Another evaluation of a cluster-randomised controlled trial of a PLA intervention to reduce low birth weight in Nepal found no evidence for an impact on household agency across a wide range of different indicators [30].
We followed up a closed cohort of surviving mother-infant dyads from the first trial of PLA women's groups in Makwanpur District, Nepal from 2001-2003 [31]. Using data collected from this cohort, on average 11.5 years after the original trial, we sought to answer the following research question:

National context
Nepal remains one of the poorest countries in the world with 25.2% of the population living below the national poverty line [32]. However, the country has recently seen rapid economic growth with GNI per capita increasing from USD 230 in 2,000 to USD 730 in 2016 [32]. In particular, remittance income from foreign labour migration has increased dramatically from 2% in 2000 to 32% of GDP in 2017 [33]. Locally, agriculture constitutes 73% of total national employment [34]. Access to technology has improved markedly. In 2006, 51% of married households accessed electricity, 13% had a television, 6% a mobile phone, 4% a refrigerator and 4% a motorcycle [35]. In 2016, 91% accessed electricity, 52% had a television, 93% a mobile phone, 16% a refrigerator and 19% a motorcycle [36].
Between 1997-2007, Nepal experienced a violent civil war between Maoist rebels and government forces that ended in a peace agreement where it was decided that Nepal would transition from a parliamentary monarchy to a federal republic. After the election of a Constituent Assembly in 2008, a protracted period of negotiation followed ending in the adoption of a new constitution for Nepal in 2015. National and local elections were held under the new constitution in 2017. As local elections had been suspended in Nepal since 1997, this constituted the first set of local elections in Nepal in 20 years.
A major ideological platform for the Maoist party was the elimination of class, caste, ethnic, and gender inequality. Substantial numbers of women were recruited into its army during the civil war, while its women's wing campaigned on issues of domestic violence and unequal inheritance rights [37,38]. In 2007, the Maoist party successfully pushed for a legal provision reserving 33% of Constituent Assembly seats for women, which was carried over into the electoral rules under the new constitution in 2015 [37]. However, the Maoist party has also been heavily criticised for pursuing elite politics to the neglect of women's interests at the grassroots level and leaving a considerable gap between its rhetoric and its own practices [37][38][39].
In terms of quantitative indicators of women's position, female literacy rates among ages 15-24 moderately increased from 2000 (60%) to 2015 (80%) [40], while female labour force participation remained approximately constant (81% in 2000, 83% in 2017) [41]. At the same time, the number of married women reporting making household decisions jointly or alone increased between 2001 (26% own health care, 30% large household purchases, 36% visits to family and friends) and 2006 (47% own health care, 53% major purchases, 57% visits to family and friends) [35,42]. However, between 2006 and 2016, little change was observed in these indicators (58% participated in decisions on own health care, 53% major purchases, 56% visits to family and friends in 2016) [36].

Trial setting
Makwanpur District is a rural hill area in the central region of Nepal with a population that was approximately 400,000 in 2011 [43]. The district is divided into Village Development Committees (VDCs), which are further sub-divided into wards. Each VDC has an average population of 9500 and most households depend on subsistence agriculture. The majority ethnic group are Tibeto-Burman Tamangs, one of a wider group of traditionally non-Hindu ethnicities in Nepal called "Janajatis". A distinguishing feature of Tamang gender norms is the relative lack of emphasis on control over female sexual purity and physical mobility as compared to Hindu gender norms [44].
consisted of community-based PLA women's groups facilitated by a local lay facilitator conducting monthly group meetings, in which members explored issues around pregnancy, childbirth and newborn health. Inclusion criteria for enrolment in the original trial evaluation required women to be pregnant, married and aged between 15 and 49, while women who were unmarried, permanently separated or widowed were excluded. The unit of randomisation was the VDC. 24 of the 43 VDCs within Makwanpur were selected randomly and pair-matched on topographic stratification, ethnic group composition and population density. One cluster of each pair was allocated randomly to the intervention arm. At the end of the trial, a 30% reduction in neonatal mortality and a 78% reduction in maternal mortality was observed in deliveries occurring in intervention compared to control clusters [31]. In the immediate post-trial period, due to the large observed impact on mortality, UCL and MIRA implemented the original intervention in control areas. Meanwhile, a revised PLA group intervention focusing on care-seeking for childhood illness and additionally involving men in maternal and newborn health was rolled out in the intervention arm. These groups continued until January 2009 when all group activities were suspended in preparation for a new trial, the "Skilled Birth Attendant Trial" (SBA trial). The SBA trial combined PLA groups with strengthening of Health Management Committees to increase skilled birth attendance [45]. All of the 43 VDCs in Makwanpur district were randomised to intervention or control (independent of previous randomisation in the original trial), with 21 in the intervention arm and 22 in the control. No PLA groups were run in control clusters of the SBA trial by UCL or MIRA. The trial ran from 2010 to 2012, after which all activities closed. The curricula of the three different models of PLA group meetings are outlined in S1 Table. Throughout the period from 2001 until 2012, newborn surveillance continued to identify newly delivered women.

Follow-up study data collection
Our follow-up study cohort consisted of women who were enrolled in the original Makwanpur trial and had therefore given birth at least once between Nov 1, 2001 andOct 31, 2003. In January 2014, field workers established contact with maternal-infant dyads from the closed cohort. Face-to-face interviews were conducted with contactable and willing participants by pairs of trained field interviewers and assistants [31]. Data collection was conducted in two waves in order to reduce the impact of the length of the interviews. Wave 1 was conducted between Jan and July 2014. Wave 2 was conducted between Sept and Dec 2014. All data were collected on android tablets using CommCare HQ [46,47]. To train field workers before administering the main survey and revise the final questionnaires based on interviewer feedback, pilot data were collected in both waves before field workers collected data for analysis. The first wave of data collection was piloted with 531 randomly sampled mother-child pairs, while the second was piloted with 141 mother-child pairs. Pilot data were excluded from this analysis.

Analytic strategy
Choice of outcome measure. Our primary measure of agency in the household was the Relative Autonomy Index (RAI) [19,48]. Alkire [49,50] has repeatedly emphasised that the autonomy construct measured by the RAI closely approximates Sen's [22] notion of agency. The RAI measures the extent to which individual behaviour is guided by internal relative to external drivers of action [51]. The RAI has been locally adapted and validated for use in Nepal [36], Bangladesh [52] and Chad [53] and suggested for use as an internationally comparable measure of empowerment [19]. The RAI has been used in impact evaluations of community-based health interventions in India [54] and Nepal [30] and regularly features in impact evaluations of health promotion interventions in high-income settings [55]. Elements of the RAI also form part of the Women's Empowerment in Agriculture Index (WEAI) [56], which has been used in studies of maternal and child nutrition in Bangladesh [57], Ghana [58] and Nepal [59,60].
Decision-making questions are widely used in impact evaluations of women's household agency [17,61,62], but do not map exactly onto our notion of 'agency' as conceptualised in this article [48,63]. This is because decision-making questions measure 'the powers that they have even if they do not value these' (p. 5) rather than 'the empowerment that people value' (p. 5) [64]. For the same reason, we also chose not to use the WEAI to directly measure agency, as it relies heavily on decision-making questions. However, we did choose to use a question on women's 'financial power' or their perceived ability to participate in decisions on major expenditures in the household as a sensitivity measure due to its widespread use in impact evaluation studies (e.g. [62]).
Global questions on perceived overall influence in one's life, such as the Power Ladder [65], do implicitly measure the powers that people value, as they do not pre-specify the types of powers that individuals should be able to affect. Nevertheless, such questions also differ from our conceptualisation of agency by focusing on opportunities for choice rather than experienced reasons for enacted actions [66]. While the availability of choice and the experience of internal motivation are likely positively correlated, they are not necessarily so, and may even conflict with one another [67]. As such, we chose to use the Power Ladder question as a sensitivity measure due to its open-ended, domain-independent nature.
Scoring the Relative Autonomy Index. The RAI measures household agency in four domains: (1) work outside the household, (2) domestic work, (3) health-seeking and (4) group participation. After a framing question on the activities, individuals carry out in each domain, respondents are asked if they perform these activities for internal reasons (e.g. because they want to or feel the activities are personally important) or for external reasons (e.g. because they fear angering family members otherwise). S2 Table shows the full list of items along with their scoring. The overall agency freedom score has a maximum of +12 and a minimum of -12. The 4 sub-domains (work outside the household, domestic work, health-seeking and group participation) are similarly scored and have a maximum of +3 and a minimum of -3.
Exposure. Our primary exposure variable was residence in an intervention (Areas 1 and 2 in Fig 1) versus control (Areas 3 to 6) area in the original trial.
Measures of subsequent potential exposure to PLA groups. Subsequent potential exposure to PLA groups in the years following trial completion was measured in three ways to describe and test model assumptions around potential community-and individual-level exposure. First, we calculated a woman's total years of residence in an area exposed to PLA interventions across Periods B to F (denoted by variable RESEXP, recorded in years). Second, using ongoing newborn surveillance data from 2003 to 2012, we calculated years of residence in an area running PLA groups while pregnant (denoted by PREGEXP, recorded in years). Since women could be pregnant multiple times from 2003 to 2012, this duration could in theory be arbitrarily large. Our assumption was that all pregnancies were detected by the surveillance system. Third, to model our assumptions around our measure of pregnancy-associated exposure time, we created a third variable IMPEXP(θ) (in units of years of exposure): IMPEXP imputes an exposure level for women based on a linear combination of the PREGEXP and the RESEXP variables. When θ is close to 0, we rely exclusively on pregnancyassociated exposure time. As θ approaches 1, we increasingly give weight to non-pregnancy associated exposure as well.
Potential confounders. Maternal education and age (years), household asset score index, caste/ethnicity, household occupation and sex of household head are both potentially important determinants of women's agency in the household and potential predictors of loss to follow-up. The effect of IMPEXP(θ) on women's agency is confounded by unobserved characteristics associated with women's ability to make household decisions on their own fertility. To proxy for baseline ability to control one's own fertility, we additionally controlled for the estimated total number of deliveries women had had upon enrolment into the original trial (called DELIVERIES). We estimated this number by subtracting the number of deliveries detected by pregnancy surveillance after the original trial had ended (Periods B-F in Fig 1) by the total number of deliveries reported at endline.
Statistical methods. First, we described differences in baseline characteristics during the original trial between intervention and control arms among women with complete data at follow-up and women with incomplete data. A complete-case analysis was conducted in case of missing data. Second, we compared crude averages of agency scores between women who had been resident in intervention and control clusters. Third, three-level hierarchical linear regression analyses were conducted with random intercepts for clusters based on VDCs and cluster pairings from the pair-matching process in the original trial with overall agency freedom as the main outcome. The four sub-domains of agency freedom (agency in work outside the household, domestic work, health-seeking, and group participation) were similarly analysed as secondary outcomes. For all 5 outcomes, effect sizes in standard deviations were reported.
We compared the results of the following six models with each of these 5 outcome measures. Models 1 to 5 use the entire study cohort. Model 6 is restricted to women who were not mothers-in-law at follow-up and who never delivered in periods subsequent to the original trial (Periods B-F in Fig 1).
• Model 1: unadjusted association of residence in intervention or control arm of the original trial with household agency.
• Model 2: Model 1, additionally adjusted for maternal education and age, household asset score index, caste/ethnicity, household occupation and sex of household head, all measured at baseline.
Model 1 is our primary model. Model 2 considers the impact of trial arm imbalances or missing data. Models 3-6 correct for exposure after the end of the original trial. Model 6 restricts the sample to the women who were least likely to have taken an interest in the women's groups after the end of the original trial. In Models 3 to 6 we additionally checked for multicollinearity in the use of IMPEXP(θ) as a control variable by calculating Variance Inflation Factors and assessing the effect of adjusting for IMPEXP(θ) on the width of confidence intervals around our main impact estimate. A Variance Inflation Factor below 10 is traditionally considered low [68]. All models from 2 to 6 include covariates adjusting for baseline covariates. We found no difference between entering the covariates into our models directly or using a propensity score created from those covariates. In all reported analyses in the results section, the covariates were directly entered into the regression model. Impacts on financial power and the Power Ladder question were analysed using the same model specifications 1-6. For financial power we used logistic regression model with fixed cluster pairing effects and Huber robust adjustment for clustering (effect sizes in logits). For the Power Ladder question, we used a hierarchical linear model with clustering at the level of the VDC and VDC pairing (effect sizes in raw steps on the ladder).
All statistical analyses were carried out in Stata 13 MP [69]. The main reasons for loss to follow-up were temporary absence or women having moved out of the study area (n = 1624) and death (n = 151). The mean age of women at follow-up was 39.6 years. No evidence for a systematic difference in follow-up rates was found (p = 0.31). Table 1 compares baseline characteristics of women between intervention and control groups. The population of women were primarily of Janajati ethnicity (77%) with a household economy that revolved around agricultural labour (91%). Most women had no education (81%) and lived with male household heads (93%). We did not find any meaningful imbalances between intervention and control on baseline characteristics among women who were available for interview at follow-up, although the proportion of Brahmin/Chhetri caste women in the intervention group (17%) was slightly higher than control (12%). Of the original 6135 women who delivered during Period A, 12%, 31%, 2% and 2% delivered again and were detected by newborn surveillance in Periods B, C, D and E respectively (Fig 1).

Trial profile and descriptive statistics
To check for bias due to missing data, we compared women who were interviewed with women who were lost to follow-up. The distribution of caste, household occupation, assets, education, maternal age and gender of the household head among respondents with missing data closely tracked the distribution of corresponding variables among women who were interviewed. Tests of interaction between availability for interview at endline and intervention/control status revealed no evidence for differential loss to follow-up with respect to caste group, asset score, maternal age, education or baseline number of deliveries by trial allocation (p-values all >0.05). Table 2 displays characteristics of women collected at follow-up, which provide key context for our findings. Nearly all women worked outside the home (99%), while about half were the main person responsible for cooking, cleaning and laundry in the house (54%), and about a third were responsible for money management and shopping for the household (36%). Most women sought health care in a facility for a moderately severe illness (64%), while the rest primarily reported going to the local pharmacy or shop (33%). 66% of women participated in groups, but the overwhelming majority of these considered financial groups the main type of group that they currently participated in-these groups are primarily Rotating Savings and Credit Associations (ROSCAs) or Accumulating Savings and Credit Associations (ASCAs). Only 2% of women considered a health group as their main group. 6% of women participated in any health group (not shown). Table 2 also displays descriptive statistics for our main outcome measure in raw, unstandardised units. On measures of agency freedom, most women scored in the high range. The mean overall agency freedom score for all women, irrespective of trial arm, was 9.6 (standard deviation 2.2). 75% of women achieved the maximum score in the employment domain and 82% achieved the maximum score in the domestic work domain, while 37% and 47% achieved the maximum in health-seeking and group participation domains, respectively (not shown in table). 19% of women achieved the maximum score on overall agency freedom. Similarly, 89% of women reported being involved in decisions on major expenditures in the household, with 70% of women reporting making decisions jointly with their husband. However, in the Power Ladder question, mean scores were much lower (4.4 with SD 1.6), as 82% of women placed themselves on Step 5 or lower out of 10. Table 3 shows impact estimates for the original Makwanpur intervention under different model specifications on overall household agency. There was a non-statistically significant positive relation between being resident in an intervention cluster at the time of the original cRCT and overall agency at follow-up (p-values all >0. 21).

Intervention impacts on household agency as measured by the RAI
Similarly, Table 4 shows no evidence for an association between exposure to the original intervention on any of the 4 sub-domains of agency: work outside the household, household  1) refer to measurements at baseline. chores, health-seeking or group participation under any model specification (Models 1-6; pvalues all >0.05). However, we found positive, non-statistically significant relations between exposure to the original PLA group intervention and health-seeking and group participation behaviour. Table 3 also shows the coefficients for associations between subsequent exposure to PLA groups in the same 6 model specifications with our main outcome, overall agency freedom. Potential subsequent exposure to PLA groups as modelled by IMPEXP(0) and IMPEXP(0.5), and therefore driven more by pregnancy related time after the trial, was negatively associated with overall agency (Models 3 and 4, p-value = 0.02 and p-value = 0.04 respectively), but there was no evidence for an association with IMPEXP(1) as a measure of subsequent exposure (Model 5, p-value = 0.64), even after restricting our sample to women who never became mothers-in-law and who never delivered again after the original trial (Model 6, p-value = 0.91). We found strong evidence that the baseline number of deliveries was negatively correlated with overall agency at endline (p-value <0.001 in Models 3-6). Community mobilisation through participatory women's groups and women's agency in the household Table 4 shows that we also found no evidence of an association between subsequent exposure to PLA groups (as modelled by IMPEXP(0), IMPEXP(0.5) and IMPEXP(1)) on three of the sub-domains of agency, namely: work outside the household, domestic work and healthseeking. However, we found evidence for modelling of higher levels of subsequent exposure to PLA groups being associated with reduced agency in the sub-domain of group participation, This was evident when modelling of subsequent exposure to PLA groups was driven more by pregnancy associated time in Model 3 (IMPEXP(0.5): p-value = 0.002) and in Model 4 (IMPEXP(1.0), p-value = 0.015). Whereas the association between subsequent exposure to PLA groups and sub-domain of group participation disappeared when modelling of exposure was determined more by non-pregnancy associated time (IMPEXP(1.0)) as the sign of the association became positive and statistically insignificant (Models 5 and 6, p-values>0. 28).

Association between potential exposure to PLA groups post-trial and agency
Calculation of Variance Inflation Factors showed no evidence for multicollinearity. For the exposure length measure, we obtained Variance Inflation Factors of 1.02 for IMPEXP(0), 1.68 for IMPEXP(0.5) and 1.71 for IMPEXP (1), which are all well below the standard cut-off of 10. Furthermore, our findings showed only minor impacts of controlling for IMPEXP(θ) on the size of the standard errors around our main effect estimate. Fig 3 displays the full range of effect sizes for the original intervention as well as subsequent exposure that was obtained by varying our estimate of later exposure to women's groups after the end of the trial. We found that varying the weight given to women's residence in areas running PLA groups outside of pregnancy, θ, across all possible values from 0 to 1 resulted in Table 3. Impact estimates of the effect of exposure to the original Makwanpur intervention on overall agency measured with the RAI. Effect sizes in standard deviations. 95% confidence intervals in brackets. Community mobilisation through participatory women's groups and women's agency in the household minimal change to estimates of the effect size of the original intervention with the vast majority of estimates between 0.2 and 0.3 SD. All the 95% CIs overlapped with zero. The effect estimates for later exposure were similarly close to zero with most estimates between -0.07 and -0.14 SD. Table 5 shows the estimated impacts on two alternative measures of agency-financial power and the ladder question. For impacts on financial power, we found neither evidence for an impact from the original intervention, nor from modelling of subsequent exposure, under any model specification (Models 1-6). For the Power Ladder question, there was a trend for residence in an intervention area during the trial being negatively associated with perceived power. However, this was statistically significant only in 3 of the 6 models, namely when Table 4. Impact estimates of the effect of exposure to the original Makwanpur intervention on sub-domains of the overall agency score. Effect sizes in standard deviations. 95% confidence intervals in brackets. Community mobilisation through participatory women's groups and women's agency in the household additionally adjusted for socio-economic factors (Model 2), additionally adjusted for subsequent exposure to PLA groups when driven by pregnancy-related time (Mode 3) and when restricted to women who did not have any further pregnancies, nor were likely to have become a mother-in-law over that time period (Model 6). Similarly, models of subsequent exposure to PLA groups that were driven by further pregnancy related time (IMPEXP(0) and IMPEXP(0.5)), but not by non-pregnancy related time (IMPEXP(1)) were negatively associated with perceived power. The number of deliveries at baseline was again negatively associated with agency as measured by the Power Ladder question (p-value <0.001 in Models 3-6).

Discussion
To our knowledge, this is one of the first published studies investigating the impact of PLA women's groups in maternal and child health on women's agency in the household with only two other studies investigating this outcome [29,30]. Long-term evaluation of participatory women's groups is important, because "[depending] on the dimension of empowerment, the context, and the type of social, economic, or policy catalyst, women may become empowered in some aspects of their lives in a relatively short period of time (say 1-3 years) while other changes may evolve over decades" (p.80) [17]. The ability of our study to exploit a rare opportunity for long-term follow-up of women potentially exposed to participatory women's groups is an important strength.
We found little convincing evidence for a positive relation between exposure to the original trial intervention and women's household agency, either overall or disaggregated by subdomain. Similarly, we did not find consistent evidence of an association between subsequent exposure to PLA groups and overall household agency. There was some evidence that measures of subsequent exposure to PLA groups driven largely by subsequent time spent being pregnant were associated with reduced agency in the sub-domain of group participation. Further, in some sensitivity analyses, exposure to the original trial intervention and measures of subsequent potential exposure driven by pregnancy-related time were associated with reduced household agency in the Ladder Question. However, these later findings should be interpreted with caution as they emerged as part of sensitivity testing and robustness checking rather than due to the primary analysis.
Adjusting for trial allocation, socio-economic factors and subsequent exposure to PLA groups, the baseline number of deliveries was negatively correlated with agency freedom on average 11.5 years after enrolment into the original cRCT (irrespective of intervention allocation), but the effect size was small. The report of one additional previous pregnancy at baseline was associated with a reduction of overall agency freedom of 0.03 points (score range -12 to +12). The baseline number of deliveries was also negatively associated with perceived power as measured with the ladder question.
Our negative findings could reflect an absence of long-term impact of PLA groups on household agency with several potentially co-existing explanations. First, the original PLA intervention may not have impacted household agency at all and thus the benefits of PLA groups might be mediated through other factors-such as psychological wellbeing and/or agency outside the household. Second, PLA groups could have had a short-term impact on household agency that was not sustained after the original intervention ended. However, a cluster-randomised controlled evaluation of the impact of PLA women's groups on resident pregnant women in Nepal found no evidence for a short-term impact on a wide range of indicators of household agency [30]. Third, the cohort at baseline may have already had levels of household agency and thus the PLA model may not have additionally improved household agency in this population. Ethnographic accounts have noted how gender norms in the Hills of Nepal are considerably less conservative than the Plains [44]. Our descriptive data suggest these women might already have had considerable agency in the family when the original intervention was trialled ( Table 2). Qualitative studies of the original Makwanpur intervention also found that community readiness for change was a key contextual element affecting the success of the intervention [70].
Alternatively, our findings could be a false negative. First, it is possible that our measure of household agency-the RAI-was unable to correctly identify household agency, either because the original tool is inadequate overall or specifically in our Nepali Hill population and/or because our adaptation and translation of that tool to this setting was flawed. However, the RAI has been validated in Nepal previously [48] and sensitivity analyses with alternative measures did not reveal a positive impact on agency either. Second, large-scale secular changes in the gendered national context may have confounded intervention effects over time. However, since trial clusters were randomly assigned to intervention and control, we have no reason to believe that such national trends would have affected the local context in the Makwanpur district differently between intervention and control. It is theoretically possible that secular changes in the national context interacted with the long-term effect of the intervention by eliminating short-term gains in women's agency in intervention compared to control after the end of the original trial. However, such an interaction would be the very definition of a lack of sustained impact and so would constitute an explanation for a lack of effect rather than a confounder.
Third, any existing differences might have become less obvious over time due to confounding by subsequent exposure to PLA groups not adequately accounted for in our models. Unlike secular trends in the national context, such exposure cannot be expected to be randomly  distributed between intervention and control areas, because UCL/MIRA consciously chose to implement subsequent interventions based on their knowledge of which clusters had already previously received a PLA intervention. However, our modelling of subsequent exposure have been carefully designed, taking into account neonatal surveillance data over almost the entire post-trial period, to adjust for subsequent exposure to PLA groups arising from UCL/MIRA collaboration. In particular, our results for the main agency score did not change materially when we varied our weightings on pregnancy-related exposure, even though the proportion of pregnant women in our closed cohort decreased from 100% in the original trial period to less than a third afterwards. In our analysis of a restricted sample in which none of the women delivered again or became mothers-in-law themselves, we also found no evidence for an impact on the main agency score, even though we effectively compared a sample where 100% of women were pregnant during an intervention period with a sample where 0% of women were pregnant. The women's groups were open to all community members, but group discussion topics revolved exclusively around perinatal health issues (see S1 Table). Even among pregnant women, the participation rate never exceeded 40% [71]. The robustness of our results to different weights on pregnancy-related exposure shows that our null result is not an artefact of how we measure subsequent exposure. Furthermore, to our knowledge there were no other governmental or non-governmental PLA group health interventions running in the area during this time period. Finally, the finding that increased number of deliveries at baseline was associated with reduced household agency at end-line is plausible given that higher fertility has been associated with poorer socio-economic status and agency in the past [26]. This has important implications for implementation of such interventions in the future that may need to take into account baseline fertility rate in targeting beneficiaries or tailoring implementation strategies.

Limitations
The main limitations of this study are 1) a lack of comparative and alternative measures of agency over time-both during intervention and subsequent years 2) a lack of concurrent qualitative data on perceived impact on empowerment at the community and the household level 3) low study power. Our measure of agency was limited to the household and did not address other domains-such as that related to child-care or community agency. For women, such as those in this cohort, with few demonstrable restrictions on social activities outside the household, agency in the community might be more relevant. Measures of social support [72], social capital [73] or community mobilisation [74,75] could be gainfully used to assess the extent to which PLA women's groups empower women at a community level as opposed to a household level. In particular, the ability of PLA women's groups to lower transaction costs to women's collective action and enhance their capacity to manage communal resources is arguably a key social outcome of the participatory approach [76].
Additionally, subjective measures are frequently criticised for being subject to cognitive and social biases, easily influenced by the framing of questions, and unable to reveal causal factors that are not accessible to introspection [77]. Over the 11.5 years from baseline to end-line, life aspirations and preferences may have changed and could lead to women feeling disempowered due to higher aspirations rather than lower 'objective conditions'. The Power Ladder question likely measures relative power, since it requires respondents to anchor the 10-point scale using the perceived agency of other women 'in their community'. Higher levels of agency among community women in general may lead to feelings of low agency when women compare themselves with each other rather than their past selves. However, we should note that more 'objective' measures of social constructs based on observed behaviour in the field or in the lab often imply equally, if not more, challenging interpretive difficulties due to the need to decode the meaning of ambiguous human behaviour. We recommend using both behavioural and subjective measures of empowerment in future research.
Finally, the precision of our effect estimates is limited. This was a post-hoc study and the original study sample size calculations, in particular determining the number of clusters, was not powered with this outcome in mind. A post-hoc power analysis based on an observed ICC of 0.36 for overall agency, 24 clusters in total and 256 individuals per cluster suggested we had 80% power to detect an absolute 0.23 standard deviation difference in mean overall household agency between intervention and control at 5% significance level [78]. As stated previously, the mean overall agency freedom score for all women, irrespective of trial arm, was 9.6 with astandard deviation of 2.2.

Conclusion
We investigated the long-term impacts of a perinatal PLA women's group intervention on women's household agency approximately 11.5 years after individuals' original exposure to the intervention. There was no robust evidence to support a long-term impact of the original intervention on women's agency in the family according to a wide range of model specifications. Without in-depth qualitative evidence and/or interim measures of household agency and other measures of agency, it is difficult to draw firm conclusions about the reasons for our lack of observed impact. Our study does, however, highlight important methodological considerations that should be taken into account when addressing the question of unpicking potential pathways to impact-at the individual, family and community level of community-based mobilisation interventions to improve maternal and child health. Future work should collect qualitative and quantitative process and implementation data over time in order to better understand the mechanisms through which women's groups and similar participatory and community based interventions improve health outcomes.
Supporting information S1