Predicting relationship quality with itself? A single general factor captures most of the variance across 34 common relationship measures

James J. Kim; Samantha Joel; Ariana M. Gonzales; Brett A. Murphy; Jacqueline C. Perez; Victor A. Kaufman; Thomas N. Bradbury; Paul W. Eastwick; Benjamin R. Karney

doi:10.1371/journal.pone.0342451

Abstract

In relationship science, researchers have generated a wide array of constructs and corresponding self-report measures to characterize, explain, and predict relationship quality – the foremost studied outcome in the field. Collectively, however, the boundaries among these variables remain unclear. In the current research, we examined the extent to which measures of relationship quality and other important relationship constructs are empirically separable from one another. Across two studies of US census-matched participants (total N = 3,439), we applied latent variable techniques (e.g., exploratory bifactor analysis) on broad pools of items representing various prominent relationship-specific constructs. Results revealed robust evidence that a single general factor Q (representing global relationship sentiment) accounts for a vast majority of common variance across distinct relationship measures. Thus, respondents appear to draw primarily on their overall global relationship evaluations when reporting on an array of presumably-distinct relationship facets. This is consistent with a ‘sentiment override’ perspective. Our findings provide novel empirical evidence for a relationship-specific response bias that challenges prevailing assumptions and practice in the field, including the widespread use of self-report methods to capture meaningful aspects of relationship functioning.

Citation: Kim JJ, Joel S, Gonzales AM, Murphy BA, Perez JC, Kaufman VA, et al. (2026) Predicting relationship quality with itself? A single general factor captures most of the variance across 34 common relationship measures. PLoS One 21(4): e0342451. https://doi.org/10.1371/journal.pone.0342451

Editor: Yee Cheng Kueh, Universiti Sains Malaysia, MALAYSIA

Received: May 7, 2025; Accepted: January 25, 2026; Published: April 1, 2026

Copyright: © 2026 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data available on OSF in Supplemental Materials: https://osf.io/e452p.

Funding: This research was supported by a UCLA Marriage and Close Relationships Laboratory Expanding the Frontiers of Relationship Science Grant awarded to James J. Kim and Samantha Joel, and a Social Sciences and Humanities Research Council (SSHRC) Insight Grant awarded to Samantha Joel (435-2019-0115).

Competing interests: None.

Introduction

Emerging scientific disciplines often face a common problem. The popularization of a new field is typically accompanied by an explosion of empirical research, which can quickly outpace the development of organizing theoretical principles. Forscher [1] articulated this issue sixty years ago using a metaphor in which builders (researchers) become so adept at building bricks (findings) that they lose sight of the ultimate goal of constructing edifices (theoretical principles):

The brickmakers became obsessed with the making of bricks. When reminded that the ultimate goal was edifices, not bricks, they replied that, if enough bricks were available, the builders would be able to select what was necessary and still continue to construct edifices. The flaws in this argument were not readily apparent and so, with the help of citizens who were waiting to use the edifices yet to be built… it happened that the land became flooded with bricks.

This metaphor is also apt in the context of subdisciplines within psychology that rely primarily on self-report methods as part of their standard measurement practices [2,3]. When researchers propose a new psychological construct, they frequently develop a new self-report measure to represent it. However, in the absence of coordinated validation efforts to adequately differentiate constructs, this can lead to issues of construct and measure redundancy [3,4]. Indeed, several domains of psychological science have struggled with the proliferation of potentially overlapping constructs and measures in recent years, whether as broad fields (e.g., industrial-organizational psychology; [5]) or as narrower research areas (e.g., empathy; [6], neuroticism; [7]). Studies show that many widely used self-report measures in psychology have been developed with limited evidence of construct validity to support their usage more broadly [8].

The current state of relationship science — the interdisciplinary study of the dynamics, processes, and outcomes of close relationships — is arguably facing a similar issue. A robust body of research links romantic relationship quality to several important life outcomes, including physical health [9], psychological wellbeing [10], stress [11], immune system functioning [12], and satisfaction with life [13]. As such, researchers have made significant efforts over the years and across multiple disciplines (e.g., psychology, sociology, communication, economics, family studies) to identify key correlates and causes of relationship quality. Yet, despite the development of numerous theories, constructs, and measures to study relationship functioning, the accumulation of ostensibly unique relationship variables and assessment tools has unfolded largely without integration [14,15]. Indeed, recent research highlights ongoing conceptual ambiguities regarding key relationship constructs and their empirical relations with relationship quality (e.g., [16]; see [17] for review).

Related to this, relationship research has traditionally relied quite heavily on self-report to capture key phenomena of interest [18]. Yet to date, there have been few systematic investigations assessing the empirical separability of commonly employed relationship measures when assessed using this method. As the number of available constructs and corresponding measures continues to grow, it thus becomes increasingly important that their contributions be examined collectively, rather than individually. Similar to other domains of psychological science, such investigations are essential to prevent the piling up of “redundant measures, bifurcated literatures, and constructs without unique psychological importance” [4]. In other words, through critical analysis, we can ensure that each collection of self-report measures adds up to an edifice, rather than a pile of bricks.

In the current article, we conducted a large-scale examination of empirical overlap among prominent constructs within relationship science. Across two studies, we administered a comprehensive set of relationship measures that broadly reflect theory and self-report methods in the field. We then used exploratory bifactor analysis and latent modeling approaches to identify shared and unique sources of variance within the item pool(s). In doing so, our investigation was well-positioned to identify common sources of variance and consolidate diverse self-report items down to a parsimonious collection of empirically distinguishable domains. Such a consolidation effort could have important implications for advancing theory and research practice in the field. For example, findings could help corroborate theoretical frameworks of relationship quality and relationship functioning (e.g., [14,19]) and potentially allow future research to test the potential added value of new relationship constructs or measures without needing to administer an extensive pool of items. More critically, quantifying the empirical overlap among prominent relationship constructs could also inform the degree to which researchers may be engaging in redundant measurement practices, or drawing spurious inferences about associations among relationship variables that are conceptually, but not empirically, distinct.

Why studying empirical overlap matters in relationship science

Poor construct and measurement clarity remains a pervasive challenge across psychology, and can hinder the theoretical and empirical progress within a discipline. For example, these issues have received particular attention in personality psychology, where construct and measurement proliferation remain major sources of concern (e.g., [20,21]). Most notably, the Big 5 dimensions (and other “Big Few” models) represent a widely influential attempt to create parsimonious coherence out of the “jingle-jangle jungle” [22] of overlapping construct measurements. Researchers commit the “jingle fallacy” [23] when they assume that measures of distinct constructs are identical merely because they have similar labels. The same sources of vagueness also leave researchers susceptible to the “jangle fallacy” [24], wherein researchers assume that measures of identical constructs are different merely because they have different labels. The Big 5 approach, developed through administering large pools of personality items and distilling them into a smaller number of distinct domains (e.g., [25,26]), has since prompted further research to test the distinguishability of related construct measures. For example, in work that bears some conceptual similarity to the approach here, Bainbridge et al. [20] recently conducted two studies where they had participants complete an extensive collection of stand-alone personality scales, as well as a Big 5 inventory, and used exploratory structural equation modeling to test which scales were meaningfully distinguishable from Big 5 measurements; they observed many of these scales offer little incremental validity and can reasonably be labeled as facets of the Big 5.

In relationships research, similar abatement of construct redundancy could be achieved by using latent variable modeling to distill a comprehensive collection of relationship self-report items down to a parsimonious collection of empirically distinguishable domains, potentially creating a “big few” of measurable self-report dimensions in relationship science. Indeed, scholars have noted that a modern challenge for relationship science is to consolidate its theoretical and empirical body of knowledge so that researchers can focus their efforts and resources on identifying central principles and mechanisms responsible for improving relationship outcomes [14]. Yet, there have been few systematic investigations assessing the degree of empirical overlap among the field’s central relationship variables. Below, we highlight a few reasons why relationship research may be particularly affected by the pernicious effects of construct redundancy, including: the large number of relationship variables that currently exist, ongoing conceptual ambiguity surrounding focal relationship constructs, and the reliance on assessing individuals’ subjective evaluations to study relationship phenomena.

Construct proliferation

Relationship science has long been susceptible to a proliferation of variables. Early inventories developed for research on family relationships initially contained over 700 variables [27], prompting critiques to address the “profusion of variables and propositions in the literature on relationships” [28]. More recently, in a large-scale collaboration, researchers compiled 43 longitudinal datasets of couples from 29 relationship science labs [29] to examine an extensive set of relationship-specific variables (i.e., constructs reflecting judgments about the relationship or the partner) and their influence on relationship quality. A total of 1148 relationship variables with unique labels were identified across studies, which the authors reduced to 62 different relationship-specific constructs through thematic coding. Conceptually similar constructs and measures frequently originate from different theoretical backgrounds and are studied in isolation, leading researchers to assume they function independently. However, these assumptions are rarely adequately tested.

Fuzzy conceptual boundaries

Relationship scientists have yet to reach consensus about the definition and conceptual boundaries of the primary outcome variable in research on relationships, namely overall relationship quality [30]. We use relationship quality as an umbrella term denoting a person’s global, subjective evaluation of whether their relationship is relatively good or bad [19]; other common terms include adjustment (e.g., [31]) or satisfaction (e.g., [32]). These global relationship evaluations are the key outcome that most theoretical models seek to explain and predict in close relationships research (e.g., [14,33]), and serve as the central criterion by which the effectiveness of clinical treatments and programs addressed to couples are measured [34].

For decades, researchers have wrestled with the proper specification and measurement of relationship quality, particularly when deciding which relationship variables fall within versus outside its definition. Yet past attempts to refine its conceptualization and measurement have not addressed the potential circularity of variables considered as predictors or measures of relationship quality. For example, several unidimensional measures of relationship satisfaction have been developed over the years and remain widely used, including the Kansas Marital Satisfaction scale (e.g., “How satisfied are you with your marriage?”; [35]), the Relationship Assessment Scale (e.g., “How much do you love your partner?”; [32]), the Quality of Marriage Index (e.g., “My relationship with my partner is very stable”; [36]), and the Couples Satisfaction Index (e.g., “I feel that I can confide in my partner about virtually anything”; [37]). Such measures have conceptualized relationship quality as a global, homogenous dependent variable, thereby implying that by differentiating global evaluations from more specific aspects of the relationship (e.g., communication, conflict, affection), researchers are justified in treating those latter constructs as predictors of relationship quality [36,38]. Yet, as evidenced by the example items, such measures appear to subsume a diverse array of relationship aspects. Indeed, one recent examination of 26 relationship quality measures identified 25 different aspects of relationship functioning embedded within the items (e.g., trust, power, forgiveness, sexuality), suggesting that these measures capture highly heterogenous content beyond global evaluations [16]. Overall, the distinctions among relationship quality, its correlates, and its predictors remain vague. Without clearer empirical guidelines, researchers in this area run the risk of attempting to predict relationship quality with itself.

Reliance on self-reports and the issue of sentiment override.

Relationship science relies heavily on evaluative self-report assessments of different relationship processes. Although studies in the field frequently involve multi-method approaches (e.g., dyadic, experimental, longitudinal, daily experience), self-report survey methods (e.g., relationship intake surveys) remain prominent. For example, a recent review of 771 independent studies of relationships published in prominent journals between 2014 and 2018 found that self-report data were featured in 96% of them, whereas other types of data collection (e.g., observational or informant ratings) were relatively rare ([18]; see [29] for similar review conclusions). This is unsurprising as most relationship phenomena of interest involve tapping into people’s relationship-specific judgments (e.g., of love, trust, intimacy, perceived partner commitment, etc.).

Reliance on self-reports to study diverse relationship constructs assumes that participants distinguish between different constructs that researchers presume capture distinct features of close relationships. However, there is little direct evidence to support this assumption. Indeed, even when participants are asked to report on specific, concrete relationship behaviors, research indicates such reports are heavily guided by their global evaluations of the relationship (e.g., [39,40]). This phenomenon has been referred to as sentiment override [41]; i.e., the tendency for self-reports of specific behaviors to reflect individuals’ overall satisfaction more than particular features of the relationship or partner (e.g., “How often did my partner hug me this week? I have no idea, but I am quite happy in this relationship, so it must have been a lot.”). The phenomenon of sentiment override poses a challenge for testing major theoretical propositions within relationship science, insofar as measures of relationship constructs that can be distinguished from relationship satisfaction in theory may not be distinguishable empirically if the same global sentiments shape responses to both assessments.

Taken together, understanding how intimate relationships succeed or fail requires that researchers incorporate and measure constructs that are clearly distinct from the outcomes they are trying to explain. In the present research, we sought to identify common sources of variance underlying participant self-reports across a comprehensive set of relationship measures to clarify their empirical relations.

Bifactor modeling as a method for lumping and splitting relationship constructs

To identify the empirical structure underlying relationship science’s multitude of constructs, we employ factor analytic procedures, including exploratory bifactor analysis, which has become instrumental in recent years for researchers seeking to investigate matters of construct-relevant multidimensionality. Bifactor models are especially well-suited for addressing whether it is empirically justified to split research measurements into multiple dimensions, versus lumping them together into a single dimension (e.g., [42,43]). See Rodriguez et al. [44, 45 for an overview of psychometric bifactor indices used to help scholars adjudicate evidence for meaningful multidimensional substance when assessing specific subdomains within item pools. (We provide an overview of the challenges of interpreting general factors in the supplemental materials). In contrast to traditional factor analytic techniques (EFA and CFA), bifactor models decompose the covariance among a collection of indicators (i.e., scale items) into two types of factors: a general factor which reflects a single source of common variance among all the indicators, and one or more specific factors (orthogonal to the general factor) that explain the remaining unique residual covariance among subsets of indicators. This approach is particularly advantageous when trying to evaluate the empirical distinguishability of multiple overlapping constructs [43,46]. Furthermore, modern applications of exploratory bifactor analysis (EBFA) have been developed and tested over the last decade, providing more robust avenues for mapping the shared and distinctive characteristics across conceptually similar psychological instruments [42,47]. For our present purposes, EBFA is ideal as it allows for evaluating the presence of a general factor to shed light on whether specific relationship judgments may be principally driven by overall assessments of relationship quality. In turn, examining specific factors can help inform which relationship constructs are empirically distinguishable and can thus be treated as theoretically distinct predictors of overall relationship quality.

Findings from a bifactor approach can also have implications for evaluating existing relationship theories and frameworks. For example, the Perceived Relationship Quality Components model (PRQC; [19]) remains one of the few direct attempts to empirically delineate the boundaries of relationship quality. Drawing on CFA methods available at the time, this model specifies relationship quality as a higher-order factor model, in which a single, general relationship quality factor is reflected by six quasi-independent relationship constructs (satisfaction, commitment, intimacy, trust, passion, and love). Yet, it does not offer clear guidance about whether the overlapping constructs (e.g., love, trust, commitment) should be treated as predictors of relationship quality or facets of it, nor does it clarify whether other (particularly newer) relationship constructs are separate from relationship quality. A bifactor approach partitions the unique covariance accounted for by specific factors from that of a potential general factor, thus providing clearer guidance regarding the incremental validity of measures that purportedly capture distinct relationship constructs [48]. If a general factor explains the vast majority of the variance among constructs, then there would be little empirical justification for considering those constructs separately (e.g., using trust to predict satisfaction “over and above” intimacy would be inappropriate if measures of these constructs are not empirically distinct).

Overview and hypotheses

In the present work, we explored the empirical relations among a broad range of relationship constructs using a common self-report study design. Across two independent studies, we selected and administered prototypical measures of relationship quality and related relationship constructs to examine the underlying factor structure within these item pools. We aimed to capture a set of measures that would be generally representative of theory and practice in the field. Study 1 focused on relationship satisfaction measures specifically, given that relationship satisfaction is the most common operationalization of relationship quality [49], and research indicating that the large number of satisfaction measures in the literature comprised of highly heterogenous relationship variables [16]. Study 2 included a broader set of relationship measures and constructs, drawing on the most widely measured constructs in the field. We employed multiple factor analytic procedures (primarily EBFA and EFA, but also CFA and CBFA in auxiliary analyses) to ensure that a full range of plausible factor solutions are evaluated [50,51]. To enhance confidence in the stability of our results, several alternative models were additionally tested for convergent evidence; this allowed us to evaluate whether results were robust across different methodological decisions within the factor analytic process (e.g., various rotation/extraction methods).

In light of prior relationship theory, we generated and pre-registered four competing factor models that could plausibly emerge from exploratory factor analyses (i.e., EFA/EBFA). Fig 1 presents these models (Models A-D) with descriptions of what they each suggest about the construal and measurement of relationship quality, as well as their significance for current practice and theory. For each model, we discuss 1) why one might expect that factor structure to emerge, 2) what statistical evidence would be required to support the model, and 3) implications each model would have for research practices within the field.

Download:

Fig 1. Each model is depicted by a set of relationship constructs, each represented by their respective scale items (depicted via sets of three overlapping squares).

Twelve constructs (C1-C12) shown for illustrative purposes. We discuss why one might expect that factor structure to emerge, statistical evidence required to support the model, and the implications of each model for research practices within the field. A) Model A (Independence): Each Relationship Measure Represents a Distinct Construct. Most existing literature assumes that each uniquely-labelled relationship construct is related to—but conceptually independent from—every other construct. The Independence Model would be supported via a factor structure in which representative item sets for each construct load onto separate (correlated) factors. This model would justify the continued practice of measuring and treating these constructs as independent. B) Model B (Corroborative): Relationship Measures Are Theoretically Organized. Relationship measures may cluster into a “big few” of broader, distinct domains with theoretical relevance. Relationship science is built on several prominent theories that may align neatly with such a model [14]. Support for Model B would provide researchers with evidence-based, synthesized frameworks to guide understandings of relationship evaluation and the operationalization of relationship quality. Researchers could develop reliable and valid measures specifically targeting these core dimensions. C) Model C (Unidimensional): Relationship Measures Reflect a Single Construct. Relationship judgments across presumably distinct domains may be explained by a single, overarching factor. This could be interpreted as consistent with the theory of sentiment override [41] in its most assertive form, whereby specific relationship judgments are posited to be primarily driven by an overall assessment of relationship quality (labelled here as ‘Q’). Model C would represent a disconcerting critique of prevailing theories and practices which rest on assumptions that individuals reliably distinguish between distinct aspects of their relationships, which are captured via standard self-report methods. D) Model D (Bifactor): Relationship Measures Have Shared and Unique Features. In Model D, the variance across relationship construct items is explained by both a general factor as well as several group specific factors. Item loadings and bifactor indices (e.g., omegaH, ECV) would inform whether a general factor systematically explains responses across all items (as in Model C), yet meaningful associations also exist among items beyond the influence of a general factor (as in Models A and B). Ultimately, Model D would challenge current practices and theory operating under assumptions of the Independence Model as evidence for a general factor would indicate that current relationship measures ineffectively capture different sources of variance attributable to common and specific relationship constructs.

https://doi.org/10.1371/journal.pone.0342451.g001

Each model is depicted by a set of relationship constructs, each represented by their respective scale items (depicted via sets of three overlapping squares). Twelve constructs (C1-C12) shown for illustrative purposes. We discuss why one might expect that factor structure to emerge, statistical evidence required to support the model, and the implications of each model for research practices within the field. Model A (Independence): Each Relationship Measure Represents a Distinct Construct. Most existing literature assumes that each uniquely-labelled relationship construct is related to—but conceptually independent from—every other construct. The Independence Model would be supported via a factor structure in which representative item sets for each construct load onto separate (correlated) factors. This model would justify the continued practice of measuring and treating these constructs as independent. Model B (Corroborative): Relationship Measures Are Theoretically Organized. Relationship measures may cluster into a “big few” of broader, distinct domains with theoretical relevance. Relationship science is built on several prominent theories that may align neatly with such a model [14]. Support for Model B would provide researchers with evidence-based, synthesized frameworks to guide understandings of relationship evaluation and the operationalization of relationship quality. Researchers could develop reliable and valid measures specifically targeting these core dimensions. Model C (Unidimensional): Relationship Measures Reflect a Single Construct. Relationship judgments across presumably distinct domains may be explained by a single, overarching factor. This could be interpreted as consistent with the theory of sentiment override [41] in its most assertive form, whereby specific relationship judgments are posited to be primarily driven by an overall assessment of relationship quality (labelled here as ‘Q’). Model C would represent a disconcerting critique of prevailing theories and practices which rest on assumptions that individuals reliably distinguish between distinct aspects of their relationships, which are captured via standard self-report methods. Model D (Bifactor): Relationship Measures Have Shared and Unique Features. In Model D, the variance across relationship construct items is explained by both a general factor as well as several group specific factors. Item loadings and bifactor indices (e.g., omegaH, ECV) would inform whether a general factor systematically explains responses across all items (as in Model C), yet meaningful associations also exist among items beyond the influence of a general factor (as in Models A and B). Ultimately, Model D would challenge current practices and theory operating under assumptions of the Independence Model as evidence for a general factor would indicate that current relationship measures ineffectively capture different sources of variance attributable to common and specific relationship constructs.

Broadly, we considered an Independent Model (Model A) that reflects existing assumptions regarding the empirical independence of distinct measures of relationship constructs; support for this model would substantiate the use of any given construct to predict another. We considered a Corroborative Model (Model B) that reflects the possibility of diverse relationship constructs organizing into a factor structure that corroborates a particular theoretical framework. For example, a factor solution could emerge in which item-level indicators for all relationship constructs are best represented by six first-order factors resembling the six PRQC constructs (i.e., satisfaction, commitment, trust, passion, intimacy, love; [19]). In addition, we considered a Unidimensional Model (Model C), whereby relationship evaluations across presumably distinct domains are instead driven by a single, general factor. Finally, we considered a Bifactor Model (Model D) in which indicators of relationship construct measures systematically reflect a general relationship quality factor, yet still organize into a number of sufficiently independent relationship constructs (thus, representing a potential integration of Models A, B, or C). Importantly, support for a factor structure other than the Independence Model (Model A) would indicate some extent of construct redundancy across measures, ranging from a select few (e.g., Model B) to the full array of relationship constructs (e.g., Model C).

Although the current research goals focused on exploratory factor analytic methods, we also conducted auxiliary analyses specifying confirmatory CFA/CBFA models based on the four pre-registered hypothesized models (Models A-D) to evaluate the evidence in support of each of the model structures; see supplemental materials for details. All EFA and EBFA models were conducted on the full item pool; no items were iteratively removed for the purposes of item reduction as our aim was to characterise the latent dimensionality underlying representative sets of existing relationship measures.

Study 1

Study 1 used 206 self-report items taken from a set of prominent relationship scales that purport to measure relationship satisfaction and constructs closely related to relationship satisfaction (29 scales and 512 items total; see Table 1). Satisfaction instruments were emphasized given their ubiquitous use as de facto measures of relationship quality (Bradbury et al., 2000). Although many of these instruments are commonly used as measures of relationship satisfaction, inspection of their subscales and item content showed that they tap a wide range of themes (e.g., trust, commitment, perceived partner responsiveness). We deconstructed these original measures, reducing the original 512 items to 206 unique items based on their semantic content (see supplemental materials for additional details). This content coding of the items showed that the 206 items represented 27 ostensibly separate relationship constructs. Our specific interest was in determining whether items representing 27 separate constructs load onto that many factors, or whether an alternative factor structure best describes the data. Given the exploratory nature of this study, we conducted EFA and EBFA to identify latent dimensions that may account for the shared variance among observed variables. EBFA estimated the proportion of reliable variance among the items accounted for by a general factor, and the proportion accounted for by one or more specific factors. All data, code, and materials used in this research are available on OSF: https://osf.io/e452p.

Download:

Table 1. Sources, constructs, and number of retained items from satisfaction scales used in Study 1.

https://doi.org/10.1371/journal.pone.0342451.t001

Method

Participants and procedure.

Items were administered in an online survey distributed via Dynata research platform in January 2021. Participants consisted of a US census-matched, national panel sample of individuals in romantic relationships, with demographic characteristics proportionate to the U.S. Census [52]. Participant demographic information is provided in Table S1 in S1 File and S2 File in the supplemental materials. Written informed consent was obtained from all participants prior to their involvement in completing the questionnaires. Only participants who indicated that they currently had a main romantic involvement and were between the ages of 18 and 75 were eligible. Among qualifying participants who received the survey invitation, 3,001 completed all survey questions (with no missing data). Data integrity measures included five attention checks, inserted randomly throughout the survey; participants who failed any of the engagement checks were excluded. This resulted in a final sample of 2,000 respondents. Survey completion took approximately 20 minutes. Respondents were compensated with cash, rewards points, or discounts. All procedures were approved by the University of California, Los Angeles’ Institutional Review Board.

Measures

A literature review by four research team members was conducted targeting measures of relationship satisfaction and its proximal correlates; this process identified 29 distinct scales with previous usage in the literature (see Table 1, and section 4 in supplemental materials). Notably, all measures ask respondents to make an evaluative rating on some aspect of their relationship, and many of these instruments contained multiple subscales. Measured constructs ranged from relationship satisfaction itself to constructs presumed to account for variance in satisfaction (e.g., trust, partner responsiveness, conflict). The initial item set comprised 512 total items representing 27 different content areas. Given the high degree of item-content overlap, this item set was consolidated further by: eliminating duplicate items; generalizing wording with equivalent meaning (e.g., spouse versus wife/husband versus partner); separating double-barreled items; removing technical language and jargon; and rephrasing questions to statements to ensure commensurate responding on the same response scale. The final item set contained 206 unique items that included multiple items assessing each of the original 27 content domains. We provide a complete list of the 206 items and their scale sources in S4 Table in S1 File of the supplement. Items were presented in random order, with participants rating their agreement with each item on a 6-point scale from 0 (Not at all Agree) to 5 (Completely Agree).

Analyses

Exploratory factor analysis.

We used the psych package [53] in R [54] to conduct a series of exploratory analyses. We examined factor solutions based on common EFA selection criteria guidelines [50,51]. This included screeplots, hierarchical item clustering, Horn’s parallel analysis, Velicer’s minimum average partials (MAP), Bayesian Information Criterion (BIC), and sample size adjusted BIC (SABIC). Relationship constructs were expected to be correlated with one another; thus EFA models were estimated using oblique rotations (promax) given its efficiency with larger datasets [55]. We examined all candidate factor solutions suggested across the various selection methods, conducting separate analyses for each potential n-factor solution derived from EFA metrics with three different extraction methods (maximum likelihood, minimum residual, principal axis). We determined the final number of factors based on EFA metrics and factor solution interpretability. We employed lenient lower-end thresholds for statistical criteria to allow maximal opportunities for higher factor solutions to be retained (primary factor loadings ≥ .30 and cross-loadings ≤ .30 on other factors).

Exploratory bifactor analysis.

When the final solutions from EFA models include multiple correlated factors, it is appropriate to evaluate whether the multiple factors are distinct, reliable facets or whether all items are informed by one general underlying dimension [44,56]. When there is possible multidimensionality, as was the case here, EBFA is useful for evaluating the incremental benefit, if any, of scoring subdomains represented by specific factors, or whether a total score (representing a general factor) is sufficient to account for the variability in a set of items [44,56].

A key feature of the bifactor model is the estimation of several psychometric indices that inform the presence and strength of general and specific factors, including omega hierarchical (omegaH: ω_H), omega hierarchical subscale (omegaHS: ω_HS), and explained common variance coefficients (ECV, ECV_SS) [44,57]. OmegaH (ω_H) reflects the proportion of variance in a unit-weighted total score attributable to a general factor. Higher omegaH (e.g., ω_H > .80; [44]) reflects the reliable variance in unit-weighted composite scores primarily attributable to a single latent construct, an indicator that items are essentially unidimensional [45]. Although there is no clear consensus on strict cutoffs for bifactor indices, prior research suggests ω_H values should be at least.50, and optimally.75 or.80 to signify acceptable reliability [44,58]. ω_HS estimates the proportion of reliable systematic variance remaining in group factors after partialing out variance associated with the general factor. Previous research suggests ω_HS values of.50 as a reasonable minimum for interpreting that a subset of items coheres as a unique dimension that is independent of the general factor [59,60]. Further, ECV is an index of unidimensionality that estimates the proportion of total variance explained by the general factor alone, where factors are assumed to be uncorrelated. Scale scores are perfectly unidimensional when all common variance is due to a general factor; as ECV approaches zero, scores reflect multiple uncorrelated dimensions. A high ECV value (e.g., > .70; [45]) lends support for a strong general factor and unidimensionality of a scale’s items. ECV_SS indices can be computed for specific factors to estimate the explained common variance for items loading on a specific factor. Dueber and Toland [57] have suggested a ECV_SS value ≥ .30 as sufficient to interpret a specific factor subscore in the context of confirmatory models.

Overall, we evaluated ω_H and ECV values, whereby high values would indicate the presence of a prominent general factor underlying the set of indicators, and low values would indicate potential evidence of multidimensionality. We also evaluated ω_HS and ECV_SS values to assess whether any specific factors identified representing distinct subdomains could be interpreted alongside a general factor. Low values would indicate that a specific factor does not explain sufficient variance beyond the influence of the general factor to be considered a meaningful subdomain. Given a range of different cutoffs for bifactor indices in the literature, we erred on interpreting support for multidimensionality more generously, to align with prevailing assumptions and practices in the field. To date, there are no fixed sample size guidelines for EBFA, and simulation studies suggest that sample size adequacy depends on model features rather than a single heuristic [61]. However, we note that our final samples (Study 1: N = 2,000; Study 2: N = 1,439) are above levels Bader et al. [61] found sufficient in most scenarios (i.e., ≥ 500), and meet or exceed thresholds used in simulation-based accurate recovery of exploratory bifactor structures (e.g., [62–64]).

EBFA was conducted using the fungible (Waller, 2021) and psych (Revelle, 2021) packages in R (R Core Team, 2022), drawing on growing EBFA resources available to researchers (e.g., [43,62,63]). At present, more work is needed to definitively establish which rotational methods are optimal across different analytic contexts. However, in recovering complex bifactor structures consisting of items cross-loading across multiple specific factors and/or items loading strongly on the general factor but not appreciably on any specific factor (i.e., pure indicators of the general factor; [62,65,66]), studies suggest that Schmid-Leiman with iterative target rotation (SLiD) modestly outperforms other methods in accurately recovering factor structures [62, 63, 67]. We expected our data would likely take on a more complex rather than simple bifactor structure as it would be reasonable to anticipate several ‘pure indicators’ [44] of a general relationship quality factor emerging from this item pool. Thus, we applied SLiD and report these results throughout this manuscript. However, as a robustness measure, we re-ran all analyses using five other bifactor algorithms to see whether results convergent and ensure that our conclusions were not simply a by-product of the estimation method (see Table S3 in S1 File supplemental materials). The BifactorIndicesCalculator package in R [68] was used to extract omegaH and ECV reliability estimates.

Study 1 results

Exploratory Factor Analyses (EFA).

Results suggested a wide range of different potential factor solutions across selection criteria, specifically models consisting of 1, 3, 4, 12, 13, 17, and 20 correlated factors. We examined each of these factor solutions across the different extraction methods (see Table S3 File in supplement). Results were highly consistent across extraction methods; here, we report results from maximum likelihood estimation. A one-factor model accounted for 45% of the variance with absolute factor loadings (|λ|) ranging from.09 to.92 (M = .65). A two-factor model accounted for 56% of the variance with 133 items loading onto a factor capturing all positively phrased items (|λ| = .50 to.87) and 73 items loading onto a factor capturing all negatively phrased items (|λ| = .42 to.83) (−.48 correlation between factors). A three-factor model accounted for 59% of the variance with 125 items loading onto a Positive factor (|λ| = .50 to.87), 72 items loading onto a Negative factor (|λ| = .47 to.82), and 9 items about Sex loading onto a separate factor (|λ| = .48 to.63) with factor correlations ranging from.25 to.48. Items on the Sex factor had cross-loadings above.30 with other factors. A four-factor model accounted for 60% of the variance with 125 items loading onto a positive factor (|λ| = .43 to.88), 71 items onto a negative factor (|λ| = .45 to.80), 11 items onto a factor about sex (|λ| = .45 to.70) and no items loading more strongly onto Factor 4, with factor correlations ranging from.01 to.53.Results from models with four or more factors indicated that factors were likely over-extracted given that no items loaded distinctly on the fourth factor, and the gains in explained variance were small. Thus, these results supported three correlated factors as the best characterization of the data. The top 10 loading items for each of the three factors in the EFA model are presented in Table 2 (see Appendix C for full pattern matrix).

Download:

Table 2. Study 1 Top 10 Item Factor Loadings in 3-factor EFA Model (206 items).

https://doi.org/10.1371/journal.pone.0342451.t002

When examining individual item loadings, we see that many items commonly treated as indicators of distinct relationships constructs loaded onto the same factors. For example, the first factor appeared to capture global, positive evaluations of the relationship, as represented by items like “I am satisfied with my partner” and “Our relationship is strong” loaded strongly (>.80); yet, it was also strongly represented by items designed to capture constructs that are often considered antecedents or consequences of partners’ positive evaluations. For example:

• Commitment: “I am committed to maintaining my relationship with my partner” (.89)
• Understanding: “My partner understands me” (.82)
• Trust: “I can always trust my partner” (.80)
• Communal strength: “Meeting the needs of my partner is a high priority for me” (.77)

A similar pattern emerged for the second factor. Items indicative of poor relationship quality (e.g., “My relationship with my partner is miserable,”) loaded strongly onto this factor but not the first factor, suggesting respondents distinguished negative evaluations of their relationships from positive ones. Yet, within this factor, items designed to assess distinct constructs considered related to relationship dissatisfaction loaded about as strongly as the negative evaluations themselves. For example:

• Divorce proneness: “My partner and I often discuss or consider divorce, separation, or terminating our relationship.” (.82)
• Hostile conflict: “My partner and I blame, accuse, and criticize one another” (.77)
• Perceived partner criticism: “I feel that my partner disapproves of me” (.76)
• Intimate partner violence: “When we have problems, my partner pushes, shoves, slaps, hits, or kicks me” (.76)

Only the third factor captured a specific content domain that respondents treated as distinct from global positive and global negative evaluations of the relationship: satisfaction with sex. Nine of the 206 items loaded strongly onto this factor (e.g., “My sex life is fulfilling,” λ = .63). Items assessing sexual enjoyment (“Sex is fun for my partner and me,” λ = .59), variety (“My partner is willing to try new things in bed”, λ = .56), and communication (“I am able to tell my partner when I want sexual intercourse”, λ = .48) also loaded onto this factor; these all had cross-loadings above.30 with other factors.

Overall, EFA results found that the 27 relationship constructs did not organize into 27 factors, but rather three. When examined together, the sole distinctions in participants’ responses pertained to items related to positive evaluations, negative evaluations, and sex-related evaluations.

Exploratory Bifactor Analyses (EBFA).

Although EFAs identified three correlated factors, this does not mean unequivocally that the three factors themselves are distinct enough from one another to merit being measured separately. It is possible that the items loading onto these factors might demonstrate more complex or idiosyncratic patterns of covariance than can be effectively captured through conventional EFA approaches. To address that question, an exploratory bifactor model was specified with a general factor and three specific factors. We present the top 10 loading items per factor in Table 3 (see Appendix C for all factor loadings).

Download:

Table 3. Study 1 EBFA Top 10 Item Factor Loadings and Bifactor Indices.

https://doi.org/10.1371/journal.pone.0342451.t003

First, the item loadings indicated that the general factor, which we refer to as “Q”, was best reflected by items conveying positive global evaluations about one’s relationship and/or partner (i.e., relationship quality). Several of these items belong to measures of independent relationship constructs, however. For example:

• Perceived partner responsiveness: “My partner understands me.” (.89)
• Satisfaction: “All things considered, I am very happy in my relationship with my partner.” (.88)
• Perceived partner appreciation: “I know I’m valued and appreciated by my partner.” (.88)

In turn, the three specific factors aligned with the Positive, Negative, and Sex factors identified in the EFA. However, this Positive factor was more closely aligned with indicators of emotional attachment (e.g., “I want our relationship to last a very long time”), and almost no items loaded more strongly on this factor than on the general factor. Further, all items loaded positively on Q, regardless of whether they were positively or negatively-keyed, and stronger loading items were more evaluative and general in content (e.g., “I am satisfied with my partner”) rather than concrete and specific in content (e.g., “My partner and I have very few friends in common”). OmegaH (ω_H) was.92, exceeding thresholds for interpreting a latent general factor, and OmegaHS (ω_HS) for the three specific factor scores were.03 for emotional attachment,.05 for negativity, and.00 for sex, well below thresholds for interpreting meaningful variance being captured by the group factor scores over and above the influence of a general factor (e.g., ω_HS = .50; [60]). Lastly, ECV was 73%, indicating that nearly three-fourths of the common variance among the 206 items was explained by the general factor alone; ECV_SS for the specific factors were 6%, 18%, and 3%, respectively. Thus, despite representing 27 separate content domains across numerous relationship satisfaction scales, the 206 items administered here mostly reflected a single underlying dimension, with little unique variance remaining explained by the specific group factors (see Fig 2).

Download:

Fig 2. Bifactor model from Study 1 EBFA.

Results indicate a strong general factor (Q) explaining the variance across all items. The three specific factors examined (depicted in dotted circles) did not receive sufficient support to warrant interpretation (i.e., did not account for meaningful variance once accounting for Q).

https://doi.org/10.1371/journal.pone.0342451.g002

Study 1 discussion

Study 1 results revealed a single general factor underlying all items, and no evidence for any other incrementally valid specific relationship factor after accounting for the general factor. This provided initial empirical evidence of construct and measurement redundancy: when self-reporting on diverse relationship satisfaction measures, responses appear to be governed by a single, global evaluation of the relationship rather than by drawing on several distinct relationship constructs. This response pattern is consistent with the process of sentiment override [41]; alternatively, it could reflect a generalized response bias [69]. Overall, Study 1 examined a comprehensive set of individual items with distinct meanings on their face, but found that such meanings did not cohere into separate constructs once the general evaluative stance was accounted for.

Study 2: Replication and extension

Study 2 aimed to replicate and expand the scope of Study 1’s findings, while addressing limitations and alternative explanations. Specifically, we examined a wider range and number of items representing a more comprehensive set of relationship constructs. This larger item set included the same 206 items from Study 1 (to allow for a direct replication test), and 202 additional items to examine empirical overlap among measures of representative relationship constructs more broadly.

Study 2’s design and hypotheses were pre-registered. The methodological framework closely paralleled that of Study 1, with some modifications. One limitation of Study 1 was that some constructs were represented by a low number of (prototypical) items (e.g., only a few items were coded as indicators of trust from established trust measures). Thus, we included a more balanced number of item indicators per construct, and from prominent scales commonly used to measure said construct (see Table S6 in S1 File supplemental materials). This strengthened the content validity of constructs represented in the item pool, and ensured at least three items per construct [70]. To address concerns that the general factor observed in Study 1 represents a response bias driven by low effort responding [71], we recruited participants through a different recruitment platform (CloudResearch), which research suggests provides higher data quality [72]. We also adopted more stringent data screening measures (e.g., improved attention checks, participant attentiveness index; [73]). Given that the general Q-factor might reflect a generalized response bias not specific to romantic relationship judgments, Study 2 additionally included non-relationship items to capture general well-being (GWB) and generalized evaluative consistency bias (ECB: the tendency to systematically rate oneself more positively or negatively than is warranted across characteristics, regardless of content) [69]. ECB is related to other response biases such as halo [74–76], positivity or acquiescence bias [77], and socially desirable responding (e.g., [78]). Our ECB items were evaluative in nature but not directed towards one’s romantic relationship (e.g., “I have good athletic ability”). Thus, we tested whether Q is a construct that is specific to relationship evaluation by examining its associations with general response bias and general well-being factors.

Study 2 exploratory hypotheses.

Given the expanded scope of our design, we did not have firm hypotheses regarding whether we would find equally strong evidence for a general factor (Q) as in Study 1, or the number of specific factors that may emerge. However, we reasoned that our design provided a better chance for specific factors to emerge given the broader range of constructs included. For instance, models of evaluative processing in relationships suggest cognitive representations of a partner can be differentiated based on global evaluations and specific perceptions [79]. Study 2 included several prominent relationship constructs that were not represented in the satisfaction-focused items in Study 1 (e.g., partner-specific attachment, capitalization, self-disclosure), many of which reflect relationship appraisals at lower and more specific levels of abstraction. Hierarchical trait models would also suggest that these constructs are less inherently evaluative compared to more abstract, global appraisals [80]. Overall, Study 2 included an ostensibly more diverse item pool, thus increasing the likelihood that distinct item sets form specific factors separate from Q.