Development and validation of the Multidimensional Internally Regulated Eating Scale (MIRES)

In this paper, we describe the systematic development and validation of the Multidimensional Internally Regulated Eating Scale (MIRES), a new self-report instrument that quantifies the individual-difference characteristics that together shape the inclination towards eating in response to internal bodily sensations of hunger and satiation (i.e., internally regulated eating style). MIRES is a 21-item scale consisting of seven subscales, which have high internal consistency and adequate to high two-week temporal stability. The MIRES model, as tested in community samples from the UK and US, had a very good fit to the data both at the level of individual subscales, but also as a higher-order formative model. High and significant correlations with measures of intuitive eating and eating competence lent support to the convergent validity of MIRES, while its incremental validity in relation to these measures was also upheld. MIRES as a formative construct, as well as all individual subscales, correlated negatively with eating disorder symptomatology and weight-related measures (e.g., BMI, weight cycling) and positively with adaptive behavioral and psychological outcomes (e.g., proactive coping, body appreciation, life satisfaction), supporting the criterion validity of the scale. This endeavor has resulted in a reliable and valid instrument to be used for the thorough assessment of the features that synthesize the profile of those who tend to regulate their eating internally.


Introduction
Internally regulated eating (IRE), which can be broadly defined as eating in response to internal, bodily sensations of hunger and satiation, is considered an adaptive way of eating with positive effects on physical, psychological, behavioral, and dietary outcomes [1][2][3][4][5][6]. IRE has been addressed from various specific theoretical perspectives including, but not limited to, those of intuitive eating [7], eating competence [8], and mindful eating [9]. Palascha et al. [10] recently reviewed these various conceptualizations of IRE to conclude that none of them captures IRE style (i.e., the general inclination towards eating in response to internal/physiological  [11], the Eating Competence Satter Inventory 2 (ecSI-2) [12], the Mindful Eating Questionnaire (MEQ) [13] and the Mindful Eating Scale (MES) [14] have made impactful contributions, but have failed to capture the full complexity of IRE and the inter-connectedness between the characteristics that define the IRE style. Therefore, there is a need for new measures to assess IRE to its full complexity and potential. The Multidimensional Internally Regulated Eating Scale (MIRES) is proposed to quantify the five individual-difference characteristics that collectively form the IRE style. The present paper reveals the systematic development and validation of the MIRES, a short and easily administered 21-item scale.
In this research we followed a stepwise, theory-based and empirically driven process to develop and validate the MIRES (Fig 1). Next to testing the scale's structure, internal consistency, measurement invariance, and temporal stability, we also examined its content, construct, discriminant, convergent, criterion, and incremental validity. In the next section, we present briefly the conceptual model of the key characteristics of the IRE style, followed by a description of the operationalization of constructs into subscales. For a more complete overview of the conceptual model, including evidence on why each characteristic of IRE style is considered adaptive, see Palascha et al. [10].

Conceptual definitions and operationalization
Collectively the concept of IRE implies that individuals are sensitive to bodily signals of hunger and satiation, have self-efficacy in using those signals to determine when and how much to eat, trust these bodily signals to guide eating, and have a relaxed and enjoyable relationship with food and eating. Sensitivity to physiological signals of hunger and satiation (SH and SS, respectively) is defined as the ability to sense/perceive and interpret the physiological signals that the body generates in response to hunger and satiation. Self-efficacy in using physiological signals of hunger and satiation (SEH and SES, respectively) is defined as the perception of ease or difficulty in using physiological signals of hunger and satiation to decide when and how much to eat. Internal Trust (IT) refers to the tendency to trust the body's physiological processes for the regulation of eating. Food Legalizing (FL) is defined as the tendency to have a relaxed relationship with food and particularly a relaxed attitude towards indulgent food. Finally, Food Enjoyment (FE) concerns the tendency to derive pleasure from eating by attending to and appreciating the sensory qualities of the food that is consumed.
IT, FL, and FE are operationalized as uni-dimensional constructs in our model (S1 Fig). Since hunger and satiation are different processes, Sensitivity to hunger signals (SH) and Sensitivity to satiation signals (SS) are operationalized as distinct constructs. The same holds for Self-efficacy in using hunger signals (SEH) and Self-efficacy in using satiation signals (SES). Furthermore, sensitivity and self-efficacy may vary across challenging situations such as when emotional or external cues are salient [15][16][17]. Therefore, we operationalized each of the constructs mentioned above along three dimensions: under 1. neutral conditions, i.e., when individuals are calm, relaxed, and without much distraction (SH: Neutral, SS: Neutral, SEH: Neutral, SES: Neutral), 2. under emotional prompts, i.e., when negative emotions are salient (SH: Emotional, SS: Emotional, SEH: Emotional, SES: Emotional), and 3. under external prompts, i.e., when external influences, such as a distracting environment, are salient (SH: External, SS: External, SEH: External, SES: External). Since individuals may respond differently to positive and negative emotions, we decided to narrow down to negative emotions. Additionally, high-arousal emotions are assumed to have a universal effect by suppressing eating, while there is more variability in how individuals respond to emotions of moderate arousal [16]. Therefore, only moderate arousal emotional states were selected for the emotional context (i.e., sadness, loneliness, boredom). Regarding the external prompts context, there is a variety of external factors that influence our eating in different ways (e.g., portion sizes, mealtime schedules, eating with others, availability of tasty food, eating in a busy or distracting environment). Given this heterogeneity, we decided to select a single external cue, eating under distraction, because it regards a generic cue that is representative of the process by which several external cues influence eating behavior (i.e., when "noise" from the external environment is salient) and is relevant for both hunger and satiation.

Model specification
Since the characteristics of the IRE style are not interchangeable-all of them are necessary for the IRE style to manifest-we treated the IRE style as a formative construct. Formative constructs are formed by the combination of their indicators and causality is assumed to flow from the

PLOS ONE
indicators to the construct [18]. Conversely, a reflective construct exists independently of the indicators that are used to measure it and causality flows from the construct to the indicators. Thus, the IRE style is formed by the totality of its seven defining constructs, while each of these constructs is a reflective one (uni-dimensional or decomposed to measurable sub-dimensions).

Methods
Through interactive discussions within the author team, we generated a pool of 103 items, which were purported to measure the individual-difference characteristics of the IRE style. Existing measures of intuitive eating [11,19], eating competence [12], mindful eating [13,14], and interoceptive awareness [20] were used for inspiration during item generation. Researchers in the field of nutrition and experts evaluated and enriched the content of the initial item pool, which then underwent two rounds of pretesting with college samples. This preliminary work helped us to identify the most appropriate and relevant items for the constructs under study, to sort out the internal structure of the scale, to optimize its length, and to identify the most appropriate method for its administration. Starting from the structure obtained from this preliminary work, we examined the scale's internal consistency, confirmed its internal structure with Confirmatory Factor Analysis (CFA), and tested its two-week temporal stability and several types of validity (i.e., construct, discriminant, convergent, criterion, and incremental) in broad samples of consumers from the UK and US (Table 1). This research was conducted according to the guidelines laid down in the Declaration of Helsinki and complied with the Netherlands Code of Conduct for Research Integrity. Written consent was obtained for all survey participants. Participants who were recruited via market research agencies had previously consented to participate in the panel of the agency. This research was approved by the Social Sciences Ethics Committee of Wageningen University and Research. The data of this project can be found here [21].

Internally regulated eating
MIRES was administered with 7-point Likert-type response scales (1 = "Completely untrue for me" to 7 = "Completely true for me") (see S1 Appendix for information on administration of the MIRES). The MIRES items were developed and tested in the English language. An overview of the initial item pool and the adjustments it was subjected to during the scale development and validation process can be found in S2 Appendix. A necessary condition for identification of formative models is the addition of at least two reflective measures that are caused directly or indirectly by the formative construct [22]. Thus, to achieve identification when testing the complete formative model we also developed six items that were reflective of the higher-order factor IRE style. We use the abbreviation RI (Reflective items) to refer to these items in the rest of the paper. Cronbach's alpha for the RI was 0.90 and AVE was 0.61. Uni-dimensionality of the RI factor was supported by the good model fit (χ 2 (9) = 110.68, p < 0.001, CFI = 0.98, TLI = 0.96, RMSEA = 0.10, SRMR = 0.03) and the high factor loadings (0.68-0.85).

Eating disorder symptomatology
The Binge Eating Scale (BES) and the Restrictive Eating Scale (RES) of the Multifactorial Assessment of Eating Disorder Symptoms (MAEDS) [24] were used to assess the frequency of manifesting binge eating and restrictive eating behaviors. Items were administered on a 7-point frequency scale (1 = "Never" to 7 = "Always"). Two items from each subscale were dropped before data collection ("I crave sweets and carbohydrates" because it regards a behavior that is non-specific for binge eating and had a low item-total correlation in the original study; "I am too fat" because it reflects a belief rather than a behavior; "I eat 3 meals a day" because it is the only item with negative item-total correlation and because for some people it may seem as a stringent behavior, while for others as an adaptive one; "I hate to eat" because it was deemed extreme and had a low itemtotal correlation in the original study). Cronbach's alphas for the adapted scales were 0.91 (BES) and 0.87 (RES). The fit of the RES model was initially unacceptable. Thus, we allowed for correlated error terms between the two items on fasting that have similar wording. BES and RES were measured to assess the criterion and incremental validity of MIRES.

Proactive coping
The 8-item Proactive Coping Scale (PCS) of the Proactive Coping Inventory, as adapted by Gan et al. [25], was used to measure cognitions and behaviors related to self-regulatory goal attainment. Items were administered on a 4-point scale (1 = "Not at all true" to 4 = "Completely true"). The PCS model fit was improved by allowing for correlated error terms between the items that refer to dealing with challenges as there is word congruence among them. We further removed the two reverse-scored items after data collection because of low item-total correlations (0.184 and 0.165, respectively). The adapted PCS had a Cronbach's alpha of 0.88. PCS was measured to assess the criterion and incremental validity of MIRES.

Adaptive eating behaviors
Two adaptive eating behaviors from the Adult Eating Behavior Questionnaire (AEBQ) were assessed [26]. Satiety responsiveness (SR) assesses with four items the tendency to respond to internal satiety signals. Slowness in eating (SE) measures with four items the tendency to consume meals at a slow pace. Items were administered on a 5-point scale (1 = "Strongly disagree" to 5 = "Strongly agree"). Cronbach's alphas were 0.81 (SR) and 0.72 (SE). SR and SE were measured to assess the criterion and incremental validity of MIRES.

Body appreciation
Body appreciation was measured with the 10-item Body Appreciation Scale-2 (BAS-2) [27]. The scale assesses the tendency of individuals to accept, respect, and have favorable opinions towards their bodies Responses were measured on a 5-point scale (1 = "Never" to 5 = "Always"). Its Cronbach's alpha was 0.96. BAS-2 was measured to assess the criterion and incremental validity of MIRES.

Self-esteem
To assess self-esteem, we used the Single-Item Self-Esteem scale (SISE) [28], which consists of a single item "I have high self-esteem" administered on a 5-point scale (1 = "Not very true of me" to 5 = "Very true of me"). Using test-retest data over three points in time and following the procedure suggested by Heise [29], developers have obtained a reliability score of 0.75 for SISE. The scale's reliability was not estimated in this study due to the lack of repeated measurements. SISE was measured to assess the criterion and incremental validity of MIRES.

Life satisfaction
The 5-item Satisfaction With Life Scale (SWLS) [30] was used to measure global cognitive judgments of one's life satisfaction. Items were administered on a 7-point scale (1 = "Strongly disagree" to 7 = "Strongly agree"). Cronbach's alpha was 0.92. SWLS was measured to assess the criterion and incremental validity of MIRES.

Weight-related measures
Current weight and height were reported in pounds and feet/inches, respectively. Values were transformed to kilograms and meters and were used to calculate Body Mass Index (BMI). Highest and lowest weight during the last four years, excluding periods of pregnancy or sickness, was also reported. Based on subtraction of these values a variable called Maximal Weight Change (MWC) was calculated. Individuals whose MWC was <4kg were classified as with stable weight. Individuals whose MWC was �4kg were asked additional questions on their weight trajectory and were categorized into 1. those who gained weight (�4kg increase in weight without significant fluctuations; fluctuations of �4kg were considered significant), 2. those who lost weight (�4kg decrease in weight without significant fluctuations; fluctuations of �4kg were considered significant), or 3. those whose weight cycled (weight had fluctuated with gains and losses of �4kg). Weight cyclers also reported number of intentional weight losses and unintentional weight gains of �4kg during the last four years. Responses were used to calculate a measure of Weight Cycling Severity (WCS). These measures were also measured to assess the criterion and incremental validity of MIRES.

Analysis and results
To confirm the scale's internal structure with CFA and to test several properties of its subscales (i.e., internal consistency, discriminant validity, measurement invariance, construct validity) we administered MIRES to a nearly representative sample (in terms of gender and age) of UK adults (N = 1380) that was recruited via a market research agency (exclusion criteria were pregnancy and lactation, history of eating disorders, diabetes, or bariatric surgery, and current use of appetite-enhancing or -suppressing medication). Data were checked for violations of normality (acceptable skewness values were below 2 in absolute value and acceptable excess kurtosis values below 3 in absolute value) and presence of multivariate outliers (i.e., values outside the boxplots of the Mahalanobis distances for raw scores and residuals). No violations of normality were observed for the variables. After exclusion of multivariate outliers (N = 20) and those who failed an attention check question (N = 386) the sample was skewed towards females and older individuals ( Table 1). Given that 195 parameters were to be estimated in the CFA model, the sample size (N = 974) was adequate to get reliable estimates based on the 5:1 participants-to-parameter ratio [31].

Internal structure and consistency
The Lavaan package [32] in R (version 3.4.1) [33] was used to conduct CFA with the Maximum Likelihood estimation. Adequacy of fit was determined by four indices (CFI > 0.95, TLI > 0.95, RMSEA < 0.06, SRMR < 0.08) [34]. The structure of MIRES was examined in a sequential process in which individual first-order factor models were tested before subscales were combined into higher-order constructs. The multi-factor model including all MIRES subscales provided a very good fit to the data (χ 2 (1040) = 2567.43, p < 0.001, CFI = 0.97, TLI = 0.97, RMSEA = 0.04, SRMR = 0.04) and all standardized factor loadings were high (above 0.70) and significant (S1 Table). A number of measurement-model modifications were made when testing this model. First, because the items in the sensitivity and self-efficacy subscales were asked in triple (across three contexts), method effects were accounted for by allowing error terms between identical items to be correlated. Second, because the conceptual distinction between contexts re-appeared in the sensitivity and self-efficacy subscales, we also accounted for context effects by allowing the disturbance terms of the first-order factors referring to the same context to correlate with each other (e.g., SH: Neutral, SS: Neutral, SEH: Neutral, SES: Neutral). Composite reliabilities and Average Variance Extracted (AVE) were calculated according to Fornell and Larcker [35]. Reliabilities of the MIRES first-and secondorder factors ranged between 0.84 and 0.96, and AVE was as low as 0.64 and as high as 0.88 (Table 2).

Discriminant validity of constructs
Several alternative models were fitted and compared to show the discriminant validity of the sensitivity and self-efficacy constructs (Table 3). First, to test whether sensitivity and self-efficacy are truly distinct from each other we compared two pairs of alternative models: one for hunger and one for satiation. Starting with hunger, in one model the three SH subscales (SH: Neutral, SH: Emotional, SH: External) loaded on a second-order factor SH and the three SEH subscales (SEH: Neutral, SEH: Emotional, SEH: External) loaded on another second-order factor SEH. In the alternative model, the two second-order factors were collapsed into one factor. The alternative model had significantly lower fit. The same was the case for the distinction between SS and SES. SH: Sensitivity to physiological signals of hunger, SS: Sensitivity to physiological signals of satiation, SEH: Self-efficacy in using physiological signals of hunger, SES: Selfefficacy in using physiological signals of satiation. � In the initial model, factors were distinct. In the alternative model, factors were collapsed into a single factor. a Alternative model-Initial model.
In a similar way, we tested the discriminant validity of hunger and satiation constructs by comparing two pairs of alternative models: one for sensitivity and one for self-efficacy. The alternative model, in which SH and SS were collapsed into one factor, was significantly worse compared to the model where the two factors were distinct. The same was the case for SEH and SES.
Finally, the conceptual distinction between different contexts of sensitivity and self-efficacy was tested. For each second-order construct (SH, SS, SEH, and SES), we compared the fit of a three-factor model in which each item loaded to its respective context versus an alternative model in which the three factors were collapsed into one factor. In all cases, the fit of the alternative model was significantly worse.

Measurement invariance
Measurement invariance was examined for the items that were asked in triple (across contexts) to test the assumption that each item should have a consistent performance irrespectively of the context in which it is asked. To do this, we constrained the loadings of these items to be equal across the three contexts. The decrease in fit in the constrained model was significant (Δχ 2 (24) = 102.502, p < 0.001), however, the changes in fit indices were within the acceptable criteria (ΔCFI = -0.002, ΔTLI = -0.001, ΔRMSEA = 0, ΔSRMR = 0.001) according to Chen's [36] recommendations for factor loading invariance (ΔCFI � 0.010, ΔRMSEA � 0.015, and ΔSRMR � 0.030).

Construct validity
Since the IRE style is by nature a non-diet eating style, we used independent samples t-tests to compare scores on the MIRES subscales between individuals who said they were currently dieting for weight loss purposes (n 1 = 131) and those who said they were not (n 2 = 843), as a means of testing the scale for construct validity in a broad sense. Non-dieters scored significantly higher than dieters in all but one MIRES subscales, in line with our expectations (S2 Table). For FE, the mean difference between groups did not reach significance.

Temporal stability
A sub-sample of 679 participants from the UK sample filled in the MIRES for a second time after two weeks. Response rate was 43.2%, but the entire survey was completed by 261 participants. Those who failed the attention check (N = 46) and two multivariate outliers were excluded, leaving a sample of 213 responses for analysis ( Table 1). The sample size was adequate to get reliable estimates in models testing the stability of first-order factors, while in models testing the stability of second-order factors the sample was slightly small (4:1 participant-to-parameter-ratio).
No violations of normality were observed for the variables. We used an elaborated procedure of temporal stability assessment as suggested by Steenkamp and van Trijp [37]. Pearson's correlation coefficients, intra-class coefficients with confidence intervals, and means for the summed scores of factors were also calculated. Stability coefficients of the MIRES first-and second-order factors ranged between .63 and .90 ( Table 4). Imposition of constraints on factor loadings did not result in significant decreases in model fit, thus, the meaning of all subscales was stable. Some subscales were further found to be stable in terms of item reliabilities (SS: Neutral and EH: External) and construct reliability (FL, SH: External, SS: Emotional, SS: External, EH: Emotional, and ES: Neutral). Finally, SH: Neutral, SEH: Neutral, and SEH manifested perfect stability as their stability coefficient was not significantly different from unity. Paired samples t-tests indicated that most factor means were stable over time; however, the means of IT, FL, SH: Emotional, and SS: External changed significantly.

Length optimization
In order to further optimize the scale's length and to have the same number of items per subscale (i.e., three), we decided to drop seven items; four items from the IT subscale, one item from the FL subscale, and two items from the FE subscale. The decision on which items to drop was based on the meaning of items to retain the scale's content validity [38]; items whose meaning was very similar to other items in their respective subscales were dropped.

Confirmation of the internal structure of MIRES as a multidimensional, formative model
The 45-item MIRES was further administered to a representative sample of 1251 adults from the US [39] (Table 1; see also S3 Table for some additional characteristics) (recruited via a market research agency) in order to confirm the internal structure of MIRES as a multidimensional formative model and to test the scale's convergent, criterion, and incremental validity. Self-efficacy in using physiological signals of hunger, SES: Self-efficacy in using physiological signals of satiation. � p < 0.001. �� Intra-class correlation coefficients using an absolute agreement definition.

PLOS ONE
Exclusion criteria were pregnancy and lactation, because these conditions relate to temporal irregularities in the eating patterns of women. Fifty-one multivariate outliers were excluded leaving 1200 responses for analysis. Based on the recommended 5:1 participants-to-parameter ratio, a sample of 1200 participants would be adequate to give reliable estimates for a model with maximum 240 parameters. All models that we tested had less than 240 parameters to be estimated, thus the sample size was adequate for our analyses. No significant violations of normality were observed for most variables. BMI and MWC had kurtosis values above 3 and the latter also had a skewness value above 2. However, according to Kline's [40] more relaxed criteria for skewness and kurtosis (<3 and <10, respectively) none of these variables were considered problematic, thus no transformations were conducted. The MIRES model was subjected to CFA (S2 Fig) with the following additional specifications. The three first-order factors-IT, FL, FE-and the four second-order factors-SH, SS, SEH, SES-loaded to the higher-order IRE style construct as formative indicators (arrows pointing to the higher-order construct). Covariances between all first-and second-order factors with the higher-order formative factor were fixed to zero, as otherwise Lavaan estimates both these covariances and the formative regression coefficients, which seem to be confounded leading to identification problems. To warrant identification, the six RI also loaded to the IRE style construct as reflective indicators (arrows pointing to the six RI).
The  Table). High and significant loadings were obtained for the six RI (0.66-0.86) and a large amount of variance in these items was accounted for by the IRE style factor (AVE = 0.82).

Convergent validity
Bivariate correlations of the MIRES total score, RI, and MIRES subscales with the IES-2 and ecSI-2 total scores were substantial and significant (0.32-0.70) (S5 Table). High correlations were particularly observed between certain MIRES subscales and conceptually related constructs of IES-2 and ecSI-2. For example, FL and FE correlated most strongly with the EatAtt (0.56) and ContSkills (0.46) subscales of ecSI-2, respectively. Similarly, SEH and SES correlated most strongly with the RHSC subscale of IES-2 (0.66 and 0.68, respectively).

Criterion validity
The criterion validity of MIRES, IES-2, and ecSI-2 was examined with Structural Equation Modelling (SEM) (for outcomes measured with multiple items) and with linear regression (for the single-item outcomes SISE, BMI, MWC, and WCS). Analyses with MIRES were conducted at the level of a total score (summed score of all items), at the level of the seven MIRES subscales as separate latent constructs (IT, FL, FE, SH, SS, SEH, SES), and at the level of the RI as an independent scale. Analyses for IES-2 and ecSI-2 were conducted only at the level of total scores. MIRES, as well as its individual subscales, displayed negative associations with binge eating, restrictive eating, BMI, maximal weight change, and weight cycling severity, and positive associations with all adaptive outcomes assessed in this study (Table 5). In general, MIRES, IES-2, and ecSI-2 displayed comparable predictive abilities (S6 Table) and all were better at predicting behavioral and psychological outcomes, compared to physical outcomes. MIRES accounted for a slightly larger amount of variance in RES, SR, and SE compared to the other scales, IES-2 was better at predicting BES, BMI, MWC, and WCS, and finally ecSI-2 was better at predicting PCS, BAS-2, SWLS, and SISE. The RI manifested comparable criterion validity to MIRES. Finally, certain MIRES subscales (FL, SH, SS, SES) achieved higher predictive power compared to the MIRES summed score for certain outcomes (e.g., RES, BES, SR, SE, BMI).

Incremental validity
The incremental validity of MIRES in relation to IES-2 and ecSI-2 was examined with SEM (for multi-item outcomes) and hierarchical regression analysis (for single-item outcomes). Specifically, we examined whether MIRES accounted for variance in each outcome measure above and beyond the variance accounted for by IES-2 and ecSI-2, respectively. At Step 1, IES-2 was entered as a single predictor of each respective outcome and at Step 2, MIRES was added as a second predictor (in SEM analyses, MIRES was also entered as a predictor in the model at Step 1, but its regression coefficient was fixed at zero). The same procedure was followed with ecSI-2. Changes in beta coefficients were not interpreted because multi-collinearity between these conceptually similar measures was expected to interfere with these estimates. For most outcomes, a significant increase in R 2 was observed when MIRES was added in the model (Table 6). Specifically, MIRES accounted for 0.7%-16% additional variance in outcome measures above and beyond IES-2 and ecSI-2. MIRES did not account for a significant increase in explained variance of physical outcomes (BMI [ΔR 2 = 0], MWC [[ΔR 2 = 0], and WCS [[ΔR 2 = 0.002]) above and beyond IES-2, neither for satisfaction with life (ΔR 2 = 0) and self-esteem (ΔR 2 = 0.005) above and beyond the variance explained for by ecSI-2.

Testing the properties of the simplified 21-item version of MIRES
Since the 45-item MIRES manifested good psychometric properties, we wanted to examine whether the inclusion of the three contexts (neutral, emotional, external) in the sensitivity and self-efficacy subscales offers predictive advantages compared to just the neutral context. In this way we could ascertain whether a simplified version of the scale (21 items) could still be applicable. To test this empirically we performed SEM and regression analysis (depending on the outcome variable) using either the full subscales (SH, SS, SEH, and SES) including all three contexts each or the neutral counterpart of each subscale to predict each outcome measured in the US sample. The full subscales accounted for 0-8% additional variance, depending on the outcome, compared to their neutral counterparts (S7 Table). In addition, the fit of the 21-item MIRES model was still excellent (χ 2 (296) = 1258.161, p < 0.001, CFI = 0.97, TLI = 0.96, RMSEA = 0.05, SRMR = 0.04) (Fig 2), correlations among the MIRES subscales and with IES-2 and ecSI-2 reduced only slightly (S8 and S9 Tables), and the incremental validity of MIRES was still upheld (S10 Table). Thus, despite the fact that the 45-item full version offers some predictive advantages, the simplified version with only 21 items generally upholds the psychometric properties of the full scale.

Discussion
Internally regulated eating is an adaptive way of eating that leads to positive physical, psychological, behavioral, and dietary outcomes as shown by the current and previous research [1][2][3][4][5][6]. While several attempts have been made to conceptualize and quantify this eating style, none seems to capture the full complexity of this construct. In this paper, we describe the rigorous development and validation of the MIRES, an instrument to assess the individual-difference characteristics that are necessary and jointly sufficient conditions for the manifestation of the IRE style.

PLOS ONE
Using a bottom-up approach, we showed that all first-and second-order factors of MIRES are measured reliably and a significant amount of variance in the items is accounted for by the corresponding latent factors. All first-order models and the multi-factor model that we tested had very good fit to the data. We confirmed that sensitivity to hunger, sensitivity to satiation, self-efficacy with hunger, and self-efficacy with satiation are distinct constructs, and that the

PLOS ONE
three contexts within each of these subscales are also distinct from each other. Results supported the metric measurement invariance of the items asked across contexts and initial evidence on the construct validity of MIRES was obtained, as non-dieters scored higher in all but one MIRES subscales compared to dieters. Scores on FE did not differ significantly between groups, suggesting that this is perhaps the least determinative characteristic among the ones that form the IRE style. We further showed that all MIRES subscales are stable over a period of two weeks in terms of factor loadings, while even higher levels of stability (in terms of item reliabilities, construct reliabilities, or correlation of the same factor over time) were evidenced for certain subscales. Pearson's correlations underestimated the true stability of these constructs, while intra-class correlation coefficients overestimated it. Factor means remained stable for most factors except for IT, FL, SH: Emotional, and SS: External. As regards the latter two factors, however, the means of their respective second-order factors (SH and SS) were stable. The change in means in IT and FL, suggests that these subscales show variation over time across the whole sample, which could be systematic (i.e., these subscales measure less stable characteristics) or random (i.e., due to chance). Further studies are required to confirm which of the two plausible explanations is true. Evidence on the multidimensional nature of the MIRES model was also obtained in this study. The convergent validity of MIRES was supported by the moderate to strong correlations with measures of intuitive eating and eating competence. Measures of IRE were generally better at predicting behavioral and psychological outcomes compared to physical outcomes, which is in line with existing evidence [1][2][3]6]. MIRES associated negatively with binge eating, restrictive eating, BMI, maximal weight change, and weight cycling severity, and positively with all adaptive outcomes assessed in this study. This confirms the adaptive nature of the constructs it assesses. The six RI had comparable predictive power to the 45-item MIRES. Furthermore, certain MIRES subscales (FL, SH, SS, and SES) accounted for a larger amount of variance in certain outcomes compared to the MIRES summed score. This further justifies their applicability as independent measures. The incremental validity of MIRES, above and beyond IES-2 and ecSI-2, was supported for most outcome variables measured in this study. Finally, we showed that the simplified 21-item version of MIRES upholds the psychometric properties of the full 45-item scale.
MIRES can be used by researchers and practitioners for a complete assessment of the IRE style as well as of its distinct components. MIRES can be used as an independent variable, moderator, or mediator in future scientific research investigating the role of IRE style in various processes in the eating domain. It can also be used as an outcome variable when assessing the impact of interventions aimed to strengthen IRE. Finally, MIRES can be used as a screening instrument by health practitioners who try to promote IRE among their clients or patients.
While MIRES manifested good psychometric properties, there are limitations that should be addressed. First, we should note that all data presented in this paper are solely based on selfreports. Although self-reports are practical tools for the assessment of personality constructs, they are subject to several types of response bias such as socially desirable responding, acquiescent responding, or extreme responding [41]. Individual responses may also be limited by the lack of sufficient self-awareness or by self-deception effects. Second, identification restrictions are inherent to formative models [42], as is the one presented in this paper. Thus, researchers who are interested in conducting CFA or SEM using the complete formative MIRES model should also measure the six RI that we specifically developed to facilitate model identification. Third, the preliminary work was conducted with college students (18-35 years old) while in later steps we used community samples (18-65 years old); thus, it could be argued that it is not safe to assume the invariance of the model's internal structure across the scale development and validation process. To test the model for measurement invariance across age groups, subgroups should have at least 980 participants each to allow for reliable estimates to emerge based on the 5:1 participant to parameter ratio. The sample sizes in our study did not allow us to conduct this analysis in the typical stepwise process [43]; however, when we fitted the model in subgroups with all but seven parameters fixed to the values obtained from the full sample (only regression coefficients of the seven formative indicators were left free to be estimated) the model fit was still acceptable (18-34 years: χ 2 (1319) = 2467.93, p < 0.001, CFI = 0.95, TLI = 0.95, RMSEA = 0.05, SRMR = 0.03; 35-65 years: χ 2 (1319) = 2969.25, p < 0.001, CFI = 0.96, TLI = 0.96, RMSEA = 0.04, SRMR = 0.05) providing, thus, preliminary evidence for the invariance of the model across age groups. Finally, we acknowledge that administration of the full version of MIRES may be more complex than other self-reports because twelve of its items are repeated across three different contexts. Thus, we advise potential users to use the simplified version of the scale that consists of only 21 items.
Next to these limitations, the strengths of this newly developed measure should also be considered. In contrast to what most scale developers do, in this research we were particularly interested in the precise specification of the measurement model. Those who aim to assess the IRE style need to measure the complete set of seven MIRES subscales and calculate a total score, while those who want to focus on a particular characteristic of the IRE style can choose to measure a subscale in isolation and calculate the summed score of items of that particular subscale. The bottom-up approach that we took for the scale's development and validation (assessing the properties of lower-order factors before moving to higher levels) can give researchers and practitioners confidence on the reliability and validity of the scale's sub-parts. It should be noted here that using only a subset of subscales would allow conclusions to be drawn only on those particular constructs that are measured and not on the IRE style construct. We further observed strong convergence and comparable criterion validity between MIRES and the six RI. Given that RI is a reliable scale in itself, it could be used as the snap version of MIRES. This adds even more flexibility in the use of the new instrument. Finally, the multidimensional nature of MIRES enables the distinction of several closely related but conceptually distinct features of the IRE style. For example, the distinction between sensitivity to and self-efficacy in using physiological signals of hunger and satiation has been examined very deficiently in existing literature (see e.g., [44]). Therefore, MIRES can be used for a more differentiated assessment of the essentials of the IRE style.
Although we followed a rigorous process for the scale's development and validation, replication of the current findings in other populations or population segments is needed. For example, the measurement invariance of the model could be tested across sexes, age groups, and other potentially interesting population groups such individuals with overweight or obesity. Once measurement invariance of the model is evidenced, norm scores can be developed for the various subgroups. Moreover, it would be interesting to administer the simplified version of the scale without any introductory text in the sensitivity and self-efficacy subscales in order to ascertain whether this influences how individuals interpret the items. Additional studies could also be conducted to assess the temporal stability of the RI scale and to ascertain whether the change in means over time in two MIRES subscales (IT and FL) that we observed was systematic or random. Future research could also test the face validity of the final MIRES because relevance of items with the construct definitions was assessed only at the very beginning of the scale development process. This would ensure that the retained items still do a good job in reflecting the meaning of the constructs they are purported to measure. Given that a theory-based approach was used in this research, we expect that MIRES will uphold its face validity. Finally, behavioral experiments could provide convincing and invaluable evidence for the construct and predictive validity of MIRES.