Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

On the comparability of frailty scores under the accumulation of deficits approach

  • Curtis Huffman ,

    Roles Conceptualization, Methodology, Writing – original draft

    chuffman@unam.mx

    Affiliation Programa Universitario de Estudios del Desarrollo, Coordinación de Humandiades, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico City, Mexico

  • Héctor Nájera,

    Roles Methodology, Validation, Writing – review & editing

    Affiliation Programa Universitario de Estudios del Desarrollo, Coordinación de Humandiades, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico City, Mexico

  • Mario Ulises Pérez Zepeda

    Roles Data curation, Validation, Writing – review & editing

    Affiliation Departamento de Investigación, Instituo Nacional de Geriatría, Secretaría de Salud, Mexico City, Mexico

Abstract

Background

While the cumulative deficit model is arguably the most popular instrument for population-level frailty screening, several questions remain unanswered regarding the comparability of the resulting scores across subpopulations.

Methods

Based on data from the five waves of the Mexican Health and Aging Study (MHAS) we draw upon the alignment method to test for measurement invariance of frailty scores as per the accumulation of deficits approach.

Results

Our results show that adjusting for measurement non-invariance not only improves predictive validity of our frailty measures, but resulting scores are more consistent with what is theoretically expected from them in longitudinal research.

Conclusions

There are clear potential benefits of measurement invariance testing as a general analytical framework from which to tackle with issues of comparability in frailty research.

Background

In geriatric assessment frailty is generally accepted as a useful concept in understanding the heterogeneity of functional decline observed with chronological aging. It refers to a condition in which the individual is in a vulnerable state at increased risk of adverse health outcomes and/or dying when exposed to a stressor [1].

Without a doubt, identifying frail people (holding a precarious balance between demands and capacity to cope) is of the utmost importance, in particular when it comes to older adults. Not only allows us to better anticipate the burden on healthcare systems, but also may lead to timely interventions that can often have dramatic effects on people’s well-being [2]. However, it is far from obvious how frailty should be quantified, and the development of measurement tools is still an ongoing process and a research priority.

In population-level screening, the cumulative deficit model elaborated by Rockwood, Mitnitski and colleagues [3] has gained popularity over the years as it is robustly flexible, and the resulting scores strongly associate with mortality and other adverse outcomes with an evident dose-response relationship.

The frailty index relies on the intuition that the more deficits a person has, the more likely that person is to be frail [4]. In accordance, its operationalization involves counting deficits specifically in health (e.g., symptoms, signs, diseases, disabilities or laboratory, radiographic or electrocardiographic abnormalities), and the resulting index is often expressed as a ratio between present deficits divided by the total deficits considered in a given population.

The fact that the cumulative deficit frailty index (CD-FI) can be constructed from data already available in most geriatric assessment surveys, and databases of the sort, is perceived as definite advantage over other alternatives. The fact that different numbers and types of deficits (that fulfill rather modest criteria) may be used in its construction, while preserving a strong association with mortality and other adverse outcomes, makes it robustly flexible and popular among researchers.

However, the fact that the CD-FI is not defined on fixed set of indicators leaves some questions unanswered regarding what we are allowed to infer from differences in frailty scores; that is, its comparability, even within the same sample [5, 6]. This much is widely understood given a CD-FI constructed from a fixed set of deficits, individuals with a frailty score that is relatively high for their age and sex show a significantly increased risk of a range of adverse outcomes; i.e., they are more frail. Hence the importance of studying population norms [7]. But what about, let’s say, sex-matched individuals of different age groups sharing the same frailty score? It is far from obvious what we should make of this case, and the proper framing of such questions requires from us to think about the way a given CD-FI (the measurement instrument) interacts with a person’s frailty (that which it is set out to measure). This is the general issue of measurement invariance.

A goal in measurement often is making comparisons across population groups using the same index. Ideally, two people with the same frailty score should be in fact equally frail. This type of inference rests on one key assumption: the underlying model and the resulting scores are invariant across the groups of interest. Measurement invariance, therefore, must hold for fair inferences about frailty scores across populations. Otherwise, if a given index measures frailty differently for individuals in different age groups, then we would not be justified in making group comparisons based on that index because it would beg the question: Is the observed difference across groups due to a group difference on how frail they are or due to differences attributable to other sources that are not of interest like measurement error?

It is not easy to overstate the importance of measurement invariance if a researcher wishes to make group comparisons. Meaningful comparisons of statistics, such as means and regression coefficients, can only be made if the measures are comparable across different groups [8]. However, whether a given index interacts with a person’s frailty in a comparable way across groups or not, or to what degree, is something that cannot be evaluated in the absence of information regarding the measurement (metrological) model.

Measurement invariance is a testable assumption and it has to do with assessing whether the overall setup of the measurement model is comparable across groups and over time. By measurement model we mean the abstract and idealized (approximate) representation of the interaction between that which we want to measure, in this case the concept of frailty, and the accumulation of health deficits, as instrumental indications provided by the data source (see [9] for an outline of the model-based approach to the epistemology of measurement). Obviously, the nature of such tests follows that of the measurement model.

Given the operational definition of the CD-FI [4], where the standard procedure for selecting health variables as candidate deficits involves the satisfaction of 4 criteria: 1) being associated with health status, 2) their prevalence must generally increase with age, but 3) must not saturate too early, and 4) cover a range of systems, it stands to reason that the underling measurement model assumes that health deficits are a manifestation of frailty (i.e., a causal or reflective model) and not the other way around (as a formative model would). This kind of measurement model requires that the deficits being considered correlate, as they should if criteria 1 to 3 are satisfied.

It is important to note that it is precisely measurement invariance what is at risk if criterion 3 is not satisfied and the deficits saturate too early. As deficits max out they stop providing us with useful information to make inferences about a person’s frailty, ultimately rendering deficits useless for comparisons beyond the point of saturation. But deficits do not need to max out to provide us with different information. If a subset of the chosen deficits were to saturate “more quickly” for different population groups or cohorts, comparisons across said groups and time would be hard to interpret. In general terms, if the deficits in health under consideration do not observe somewhat the same modeled relationships between them (correlations in this case) across comparison groups or points in time, chances are that comparisons being made on this basis do not have the meaning we meant for them (i.e., are invalid).

To show the potential of these new practices in frailty research, as an empirical application we use the five waves of MHAS to show the importance of evaluating (and adjusting for) measurement non-invariance in longitudinal research.

Methods

In all of our estimates we used data gathered by the Mexican Health and Aging Study (MHAS). The MHAS is a national longitudinal study of adults 50 years and older in Mexico. With baseline conducted in 2001, representative of adults born in 1951 or earlier, in 2012 a new sample of adults born between 1952-1962 was added to refresh the sample, and once more in 2018 with adults born between 1963 and 1968.

In order to guarantee reasonable group sizes, we have defined 3-year age-groups in the range of 50 to 88 years of age. Also, to mitigate the risk of survival bias in our estimates, the working sample excludes all those individuals whose age was above 79 (roughly the life expectancy at 50 years of age in Mexico in 2008 [10]) by the time they were detected by the survey for the first time.

In a nutshell, the MHAS (I-V) panel data is shaped into wide form with age-group columns temporarily overlapping the cohort data as illustrated in Table 1.

thumbnail
Table 1. Number of observations by age groups and cohorts in the MHAS-ALD.

https://doi.org/10.1371/journal.pone.0292129.t001

CD-FI

For comparability, and following standard procedures [4], we constructed a 35-deficit CD-FI based on [11, 12]. Deficits included functional status, chronic diseases, self-rated health, cognitive status, and depressive symptoms. All self-reported. Further details about selection, coding and screening (syntax included) can be found in [11].

Health outcomes

Mortality was recovered from next-of-kin data. MHAS assess falls by asking “Have you fallen down in the last two years?”, whose positive answer elicits the question “Approximately how many times has this happened?”. Fall syndrome was defined as having answered >2 to this last question. Gait speed and handgrip strength tests were only taken for a subsample of individuals in 2012. Gait speed refers to the time (best out of two tries) it takes an individual to cross the first foot over the end of a 4-meter strip, where a threshold of 0.8 m/s was used to define abnormality. For handgrip tests, cut-off values of 20 and 30 kilograms were used for women and men, respectively.

Alignment method

Invariance analysis is a method that assesses whether the underlying measurement model of a scale is equivalent across the groups of interest. To do so this technique looks at the similarity of parameters of the underlying measurement model given that these constitute the blueprint of the observed scores. For the CD-FI there are two key parameters of interest: factor loadings and thresholds. The first set tell the amount of explained variance of a given item. Ideally, this amount should be equivalent across groups otherwise the observed scores will reflect these unwanted differences. Thresholds refer to the level of frailty at which one person experiences a given deficit. In principle, two people with the same levels of frailty should be equally likely to present the observed deficit.

It is important to note that, under measurement invariance, we do not expect from the CD-FI’s different components to exhibit the same prevalence across comparison groups. Rather, what we do expect is that the amount of the frailty score’s variability explained by each component should be roughly the same. In other words, what is expected is that, for a given level of frailty, the likelihood of expressing a particular health deficit is roughly the same, irrespective of the population subgroup to which individuals belong.

Needless to say, measurement invariance is a regulative ideal that can only be approximated in practice [13], which makes its quantification all the more important. There are two main different methods to look at the equivalence of the parameters across groups: Multiple Group Analysis (MGA) and the Alignment Method. Both belong to the family of techniques within the field of structural equation modelling (SEM).

Traditionally, assessing measurement invariance is conducted under the MGA approach by way of Confirmatory Factor Analyses, which requires some statistical expertise in Structural Equation Modelling, and become problematic when many groups are tested. In contrast, the Alignment Method [14], as a (somewhat) recently developed alternative, approaches measurement invariance as an optimization problem. An innovation that fully automates the procedure of identifying noninvariant deficits (items) while estimating a model that allows for group mean comparisons.

In this paper we resort to the Alignment Method to obtain a CD-FI model that allows for age-group comparisons via latent scores. While it can also be used to improve particular CD-FIs by screening for invariant deficits so that (observed) raw scores can be used.

Unlike traditional MGA approaches to measurement invariance, the Alignment Method is not a measurement invariance testing procedure in itself, which makes it more suitable when dealing with a large number of groups. It is rather a treatment of measurement invariance as an optimization problem [15]. Its goal is to estimate the simplest model with the largest amount of invariance —that is, a model with factor loadings and intercepts/thresholds (item-level parameters) that are as close to equivalent as possible. As long as researchers are dealing with minor measurement differences (approximate measurement invariance), the alignment method produces a factor model that is sufficient to make (unbiased) factor mean comparisons by selecting latent scores means and variances that minimize measurement non-invariance of the item-level parameters. Of course, if the assumption of approximate measurement invariance is violated the simplest and most invariant model may not be the true model [16].

While the alignment optimization was not designed to evaluate whether instruments are approximate invariant (as this is an assumption of the optimization procedure), and does not allow for the testing of specific models with differing levels of measurement invariance, if the methods assumptions hold, the results indicate which items are non-invariant. Unfortunately, there is no existing package in R that replicates the alignment method, and all of our estimates were obtained using Mplus 8.5. A brief tutorial on the alignment method is provided in [17].

Predictive performance

In the absence of a reference method to measure frailty, to further strengthen our confidence in the added value of adjusting for measurement non-invariance in CD-FI construction, we follow the common practice of comparing its relative performance in predicting adverse health outcomes (predictive validity) vis-à-vis the traditional (raw) version of the CD-FI and its centiles (according to the individual’s sex and age-group). Particularly, we used logistic regressions for mortality risk, fall syndrome, low gait speeds (<8 m/s) and handgrip strength (<20 for women and <30 kilograms for men). All regressions included both age and age squared with a full set of sex interactions. For comparability purposes regarding the relative importance of the frailty scores, and help with the interpretation, x-standardized coefficients (given the frailty scores have different scales) are reported along with a measure of area under the receiver operating characteristic (ROC) curve –a graph of sensitivity versus one minus specificity as the cutoff c is varied. A model with no predictive power has a ROC area of 0.5 while a perfect model has an area of 1.

Results

Alignment

A 13-group alignment analysis of the 35 items is performed for the 13 time points by sex separately. The sex-specific analyses are meant to keep the focus on cross-time comparisons as CD-FI dynamics are found to be strongly sex-sensitive [18]. The results of the 13-group analyses are shown in Tables 2 and 3. The tables indicate which item parameters, thresholds and loadings, are non-invariant in which groups with asterisks and plus signs, respectively. It is seen that, for both women and men, there are more non-invariant thresholds (24% and 16%) than loadings (5% and 3%). Well under the 25% rule of thumb for trustworthy alignment results mentioned in [19] and supported by simulations in [20].

thumbnail
Table 2. Invariance results for aligned thresholds (*) and loadings (+) parameters for all deficits considered (Women).

https://doi.org/10.1371/journal.pone.0292129.t002

thumbnail
Table 3. Invariance results for aligned thresholds (*) and loadings (+) parameters for all deficits considered (Men).

https://doi.org/10.1371/journal.pone.0292129.t003

The results in Tables 2 and 3 also include the alignment R-squared measure, which is meant to be interpreted as the proportion of variation across groups of the parameters, intercepts and loadings, explained by variation in the factor mean and variance [respectively] across groups [21]. In this way, values close to 1 are associated with invariant parameters, while values closer to 0 are generally associated with non-invariant parameters.

For developing and evaluating a particular CD-FI to modify or improve it by ensuring there is invariance, it is of interest to see which deficits and which age-groups contribute most to the non-invariance. It is found that among the most invariant deficits are self-rated health (health), help with finances (fin), loss of appetite in the last 2 years (anorexia). [11] had already identified health as one the most relevant deficits based on a Markov network analysis.

Cancer is among the deficits that contribute the least to the non-invariance of the CD-FI, even though the associated parameters show no significant difference across age-groups for women. This result largely agrees with [12], who find the cancer diagnosis as marginally independent from other deficits. The fact that it does not show asterisks or plus signs is due to large errors given its low prevalence, particularly around the early 70s, resulting in statistical tests of low power and the highest severity under the model (takes the longest to activate) as well as providing the least information to discriminate between levels of frailty.

Next to cancer, bed seems to contribute the least to the CD-FI’s comparability, but for different reasons. By all appearances, spend one day or more in bed due to sickness or injury is associated with higher levels of frailty before reaching 70 years of age than after. While, at the same time, its activation takes a relatively long time to build up (low discrimination) rendering of little use in distinguishing but the largest differences in frailty.

Perhaps unsurprisingly, not doing exercise or hard physical work at least 3 times a week (exercise) seems to mean quite different things through the life course in terms of frailty. For younger women it is even likelier to be associated with more vulnerable health states. This makes sense if, for example, in contexts of low preventive care access, women younger than 50 years of age hardly ever do physical work 3 times a week unless prescribed by a doctor at the onset of a serious illness. Be that as it may, the deficit exercise seems to “activate” (going from 0 to 1) at different levels of frailty for different age-groups (see the asterisks at the bottom of Table 2), thus providing different information and compromising the interpretation of the frailty score.

Something similar happens to deficits related to depressive symptoms (effort, depress, happy), they seem to be related with higher levels of frailty before reaching 70 years of age.

Table 3 shows noticeable contrasts with Table 2 that warrant further inspection. First, Table 3 shows more invariance in general (see the instrumental activities of daily living: shop, meals, meds, fin).

However, there are also important similarities. Exercise exhibits the same pattern of high errors in the first age-groups, lower discrimination, but decreasing severity (difficulty, easier to activate as they grow old, less frailty is needed for its activation). It is important to note that the opposite seems to be the case with women, for whom the activation of this deficit seems to be delayed as they grow older. Indeed, exercise seems to mean different things in terms of frailty by sex and age-group. Also bed remains among the most non-invariant deficits for men, as self-rated health (health) and loss of appetite in the last 2 years (anorexia) remain among the deficits that contribute the most to the CD-FI comparability.

A subproduct of the alignment optimization method are the frailty scores that result from the factor model with the largest amount of invariance; that is, the frailty scores that are as comparable as possible. Figs 1 and 2 show the distribution of the resulting frailty scores, raw and adjusted for measurement non-invariance, respectively.

thumbnail
Fig 1. Raw frailty scores distribution by sex and age group.

https://doi.org/10.1371/journal.pone.0292129.g001

thumbnail
Fig 2. Aligned frailty scores distribution by sex and age group.

https://doi.org/10.1371/journal.pone.0292129.g002

Two things are worth noting in looking at these figures. First of all, while in both cases mean frailty scores grow as age groups grow older, unlike the raw scores, the aligned frailty scores also reduce their dispersion. This measure of convergence is exactly what one would expect of a successful frailty measure: An age at which virtually all people are frail [22]. Second, the aligned frailty scores also exhibit a measure of convergence between sex. Again, the divergence exhibited by the raw scores between men and women as age groups grow older is hard to accommodate with what we would expect from a successful frailty measure.

Predictive performance

Table 4 shows the standardized coefficient (the log odds times the standard deviation of the corresponding frailty score) of the logistic regressions to assess the effect of adjusting the CD-FI for measurement non-invariance. There we can see that, in all four cases (panels), the adjusted CD-FI (Aligned, fifth column) exhibits greater predictive performance than the traditional and centilized approaches (Raw and Centiles, third and forth columns, respectively). We see in the first panel, for example, that a 1 standard deviation increase in the aligned CD-FI produces, on average, a 0.907 increase in the log odds of mortality within three years.

thumbnail
Table 4. Logistic regression results for adverse outcomes risks (x-standardized coefficients).

https://doi.org/10.1371/journal.pone.0292129.t004

Note that, if only marginally, also the area under the ROC curve is larger for the aligned CD-FI in all four cases.

Discussion

Quantifying frailty requires an abstract representation of the measurement process; i.e., a measurement (metrological) model [23]. That this model holds (is equally applicable) in each group under contrast is a fundamental assumption underlying every conclusion derived from such comparisons. Whether there are good reasons behind this assumption or not is not only a theoretical, but also an empirical issue. In the absence of an impeccable measure of frailty (a so-called “gold standard”), the burden of the proof falls with those making conclusions based on the comparability of the scores.

As a great deal of theoretical and empirical research on measurement invariance has been conducted within the contexts of Factor Analysis and Item Response Theory (IRT), particularly as the issue pertains to psychological and educational assessment [24], it naturally begs the question of their pertinence in the case of the CD-FI.

Our results in the previous section are predicated on the acceptance that, in the measurement model underlying the CD-FI, frailty exist at a deeper conceptual level than the health deficits, and that the latter are consequences (effects) the former. Not everybody agrees [25, 26]. On this matter, [27] suggests a mental experiment to help researchers to think about their measurement models: Imagine a change in a person’s frailty, net of other health deficit influences. Will this lead to a change in deficits? If the answer is on the negative, you will hardly find much value in our results. Note also that the answer to this question also has implications for what we should expect in terms of frailty from specific deficit repair, as well as the informational value of data-driven frailty profiles [28].

However, it is important to note that if, as Drubbel et al. [25] and Xue & Varadhan [26] argue, the measurement model underling the CD-FI does not consider health deficits as effects of frailty (but the syndrome itself), and consequently there is no good reason why deficits should correlate in the first place, while this would invalidate our results, this does not exempt researchers from providing evidence in favor of the comparability of the frailty scores in terms of the corresponding measurement model, it would simply shift the focus of invariance testing away from the observed correlations between deficits. In other words, however modeled the measurement process, measurement invariance is an issue that every group comparison needs to address.

On the other hand, if our assumption of the measurement model underlying the CD-FI being reflective makes sense, our results may prove important for frailty measurement, and our understanding of the heterogeneity of aging population, as measurement invariance testing provides a robust framework (and new avenues) to investigate frailty differences across population groups.

Underpinning our confidence in contrasts between people of different sex and age-group has the potential of furthering our understanding of subclinical frailty (or pre-frail, as it is usually calculated based on the whole distribution of frailty scores in a population [29]), pathways that underlie frailty (individual longitudinal trajectories), and even the well-known male-female health-survival paradox [30].

As measurement invariance is a fundamental assumption in making comparisons between population groups, as well as between individuals, it is also important to note that, even though our focus in this article has been on population-level screening based on secondary data analysis, the alignment method may also prove useful for the design of frailty measures fitted to guide clinical care decisions –by way of assessing which items behave differently across groups of individuals, provided a reflective measurement model make sense in the first place.

Admittedly, our age-group invariance analysis may be confounded by cohort effects. As shown in Fig 1, any given age-group in our data comes from different cohorts up to (roughly) 30 years apart, and any given cohort only takes us as far as 18 years (the distance in time between MHAS’ waves V and I) in the best of cases. It may well be that our invariance results are more a result of a cohort effect rather than an age-group effect. Fortunately, the alignment method can help us to untangle this whenever the sample size allows for it. Notice, for example, that we only have 258 individuals born between 1925 and 1927 in the 86–89 age-group, which would render the alignment results somewhat unreliable for this specific group. Nevertheless, whatever the source of non-invariance, comparability (and inference) requires its minimization.

Conclusion

In this paper we focus on practical considerations surrounding the comparability (same meaning) of the CD-FI across subpopulations. While the nature of the relationship between the CD-FI, as an instrumental indication, and the concept of frailty (an individual’s vulnerability to adverse health outcomes) remains somewhat undertheorized and, as a consequence, the measurement model underling the CD-FI is still contested, there is hardly a better analytical framework than measurement invariance to tackle these general issues. We believe our results show the potential benefits of this approach for frailty measurement development, particularly of the alignment optimization method whenever multi-group factor analysis proves pertinent. Without a doubt, more work is necessary to reap the full benefits of measurement invariance testing in advancing our understanding of frailty measurement, but we believe pursuing such work will inevitably add greater clarity also into its mechanisms and management.

References

  1. 1. Morley JE, Vellas B, Abellan van Kan G, Anker SD, Bauer JM, Bernabei R, et al. Frailty Consensus: A Call to Action. Journal of the American Medical Directors Association. 2013;14(6):392–397. pmid:23764209
  2. 2. Vellani S, Cumal A, Degan C. Frailty assessment and interventions for community-dwelling older adults: a rapid review. Nursing Older People. 2022;34(5).
  3. 3. Rockwood K, Mitnitski A. Frailty defined by deficit accumulation and geriatric medicine defined by frailty. Clinics in Geriatric Medicine. 2011;27(1):17–26. pmid:21093719
  4. 4. Searle SD, Mitnitski A, Gahbauer EA, Gill TM, Rockwood K. A standard procedure for creating a frailty index. BMC Geriatrics. 2008;8(1):24. pmid:18826625
  5. 5. Shi SM, McCarthy EP, Mitchell S, Kim DH. Changes in predictive performance of a frailty index with availability of clinical domains. Journal of the American Geriatrics Society. 2020;68(8):1771–1777. pmid:32274807
  6. 6. Blodgett JM, Pérez-Zepeda MU, Godin J, Kehler DS, Andrew MK, Kirkland S, et al. Frailty indices based on self-report, blood-based biomarkers and examination-based data in the Canadian Longitudinal Study on Aging. Age and Ageing. 2022;51(5):afac075. pmid:35524747
  7. 7. Pérez-Zepeda MU, Godin J, Armstrong JJ, Andrew MK, Mitnitski A, Kirkland S, et al. Frailty among middle-aged and older Canadians: population norms for the frailty index using the Canadian Longitudinal Study on Aging. Age and Ageing. 2021;50(2):447–456. pmid:32805022
  8. 8. Chen FF. What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology. 2008;95(5):1005. pmid:18954190
  9. 9. Tal E. A model-based epistemology of measurement. In: Mößner N, Nordmann A, editors. Reasoning in measurement. Routledge; 2017. p. 245–265.
  10. 10. Mina Valdés A. Ley de mortalidad mexicana. Funciones de supervivencia. Estudios Demográficos y Urbanos. 2006;21(2):431–456.
  11. 11. García-Peña C, Ramírez-Aldana R, Parra-Rodriguez L, Gomez-Verjan JC, Pérez-Zepeda MU, Gutiérrez-Robledo LM. Network analysis of frailty and aging: Empirical data from the Mexican Health and Aging Study. Experimental Gerontology. 2019;128:110747. pmid:31665658
  12. 12. Ramírez-Aldana R, Gomez-Verjan JC, García-Peña C, Gutiérrez-Robledo LM, Parra-Rodríguez L. Understanding frailty: probabilistic causality between components and their relationship with death through a Bayesian network and evidence propagation. Electronics. 2022;11(19):3001.
  13. 13. Marsh HW, Guo J, Parker PD, Nagengast B, Asparouhov T, Muthén B, et al. What to do when scalar invariance fails: The extended alignment method for multi-group factor analysis comparison of latent means across many groups. Psychological Methods. 2018;23(3):524. pmid:28080078
  14. 14. Muthén B, Asparouhov T. Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociological Methods & Research. 2018;47(4):637–664.
  15. 15. Luong R, Flake JK. Measurement invariance testing using confirmatory factor analysis and alignment optimization: A tutorial for transparent analysis planning and reporting. Psychological Methods. 2022;. pmid:35588078
  16. 16. DeMars CE. Alignment as an alternative to anchor purification in DIF analyses. Structural Equation Modeling: A Multidisciplinary Journal. 2020;27(1):56–72.
  17. 17. Rudnev M. Elements of cross-cultural research: Alignment method for measurement invariance: Tutorial; 2022. https://maksimrudnev.com/.
  18. 18. Kulminski AM, Ukraintseva SV, Akushevich IV, Arbeev KG, Yashin AI. Cumulative index of health deficiencies as a characteristic of long life. Journal of the American Geriatrics Society. 2007;55(6):935–940. pmid:17537097
  19. 19. Muthén B, Asparouhov T. IRT studies of many groups: The alignment method. Frontiers in Psychology. 2014;5:978. pmid:25309470
  20. 20. Flake JK, McCoach DB. An investigation of the alignment method with polytomous indicators under conditions of partial measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal. 2018;25(1):56–70.
  21. 21. Asparouhov T, Muthén B. Multiple Group Alignment for Exploratory and Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal. 2023;30(2):169–191.
  22. 22. Rockwood K. What would make a definition of frailty successful? Age and ageing. 2005;34(5):432–434. pmid:16107450
  23. 23. Tal E. Calibration: Modelling the measurement process. Studies in History and Philosophy of Science Part A. 2017;65:33–45. pmid:29195647
  24. 24. Bauer DJ. A more general model for testing measurement invariance and differential item functioning. Psychological Methods. 2017;22(3):507. pmid:27266798
  25. 25. Drubbel I, Numans ME, Kranenburg G, Bleijenberg N, de Wit NJ, Schuurmans MJ. Screening for frailty in primary care: a systematic review of the psychometric properties of the frailty index in community-dwelling older people. BMC Geriatrics. 2014;14(1):1–13. pmid:24597624
  26. 26. Xue QL, Varadhan R. What is missing in the validation of frailty instruments? Journal of the American Medical Directors Association. 2014;15(2):141–142. pmid:24405640
  27. 27. Bollen KA, Diamantopoulos A. In defense of causal-formative indicators: A minority report. Psychological Methods. 2017;22(3):581. pmid:26390170
  28. 28. Bohn L, Zheng Y, McFall GP, Dixon RA. Portals to frailty? Data-driven analyses detect early frailty profiles. Alzheimer’s Research & Therapy. 2021;13(1):1–12. pmid:33397495
  29. 29. Rockwood K, Andrew M, Mitnitski A. A comparison of two approaches to measuring frailty in elderly people. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences. 2007;62(7):738–743. pmid:17634321
  30. 30. Gordon E, Peel N, Samanta M, Theou O, Howlett S, Hubbard R. Sex differences in frailty: A systematic review and meta-analysis. Experimental Gerontology. 2017;89:30–40. pmid:28043934