Was Aristotle right about moral decision-making? Building a new empirical model of practical wisdom

Shane McLoughlin; Stephen Thoma; Kristján Kristjánsson

doi:10.1371/journal.pone.0317842

Abstract

This article presents the development and validation of the Short Phronesis Measure (SPM), a novel tool to assess Aristotelian phronesis (practical wisdom). Across three studies, using large, nationally representative samples from the UK and US (demographically matched to census data), we employed a systematic and rigorous methodology to examine the structure, reliability, and validity of the SPM. In Study 1a, exploratory factor analysis identified ten distinct, internally reliable components of phronesis, challenging the traditional four-component Aristotelian model. Study 1b confirmed these findings in two additional nationally representative samples from the UK and the US. In Study 1c, the SPM demonstrated strong test-retest reliability over two months. Study 2 used network analysis to uncover interrelations among the components, allowing for the creation of a new and empirically driven neo-Aristotelian model of phronesis. In Study 3, we tested criterion validity, showing phronesis correlates positively with flourishing and predicts flourishing two months later, demonstrating strong predictive validity. Phronesis also correlated with Big 6 and Dark Tetrad personality traits, moral disengagement, and Moral Foundations in expected directions. Importantly, phronesis predicted key outcomes—related to flourishing, moral disengagement, and morally relevant aspects of personality—beyond what Moral Foundations alone explained, with an average increase in predictive power of 13.7% across all outcomes. The SPM is quick to administer (15–20 minutes), making it a valuable tool for researchers and practitioners in psychology, education, and professional ethics. The introduction of the neo-Aristotelian Phronesis Model, and the identification of central phronesis components, offers actionable insights for moral psychologists and moral educators, suggesting areas of focus that could yield broad, positive effects across related traits, providing a significant contribution to both theory and practice.

Citation: McLoughlin S, Thoma S, Kristjánsson K (2025) Was Aristotle right about moral decision-making? Building a new empirical model of practical wisdom. PLoS ONE 20(1): e0317842. https://doi.org/10.1371/journal.pone.0317842

Editor: Sorin Adam Matei, Purdue University, UNITED STATES OF AMERICA

Received: June 17, 2024; Accepted: January 6, 2025; Published: January 22, 2025

Copyright: © 2025 McLoughlin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data and analysis files are available from the Open Science Framework database (accession link: https://tinyurl.com/spmdevelopment).

Funding: This research was funded by the John Templeton Foundation, Grant no. 962685, to the Jubilee Centre for Character and Virtues.

Competing interests: We have no known conflicts of interest to declare.

Introduction

You find yourself at a crossroads: whether to report a close male colleague’s inappropriate remark about female co-workers, or to remain loyal to a long-standing friendship. On one hand, professional integrity and respect for others demand transparency and accountability. On the other, loyalty and personal trust suggest restraint, recognizing that people can make mistakes. In an age where every action is scrutinized, are we bound to prioritize ethical duty over personal bonds? Or does the pursuit of justice risk eroding the very relationships that hold us together? Quandaries like this represent the rough and tumble of moral life for those who are committed to trying to make the right moral choices. Moralists, authors of popular fiction, and scriptwriters of films and soap operas tend to be obsessed with the dichotomy between saints and sinners: the “goodies” and the “baddies.” However, moral philosophers and moral psychologists have historically been sensitive to the fact that most ordinary “decent” people are not wrestling with the fundamental question of whether to be good or bad: they are, however, constantly struggling with quandaries in which two or more moral virtues seem to clash. Kohlberg’s [1] whole research agenda in moral psychology was premised on the assumption that its biggest question was about how to bridge the gap between wanting the good and doing the good [2]. Much earlier, the progenitor of current virtue-based moral philosophy, Aristotle, had mostly ignored the “baddies”—as beyond redemption—and focused on offering guidance to his readers who had been “brought up in good habits” ([3], pp. 6; 292 [1095b4–5]; [1179b11–31]).

Aristotle famously posited phronesis (practical wisdom) as directing the whole virtue orchestra: a metacognitive intellectual virtue that oversees the moral values and virtues and provides them with the necessary checks and balances to secure overall ethically wise decisions. It is concerned both with finding the correct “golden mean” (i.e., the medial state between deficiency and excess) of a single virtue, when applied in complex contexts, and the proper interactivity, or balancing, of different virtues where those seem to call for different responses in the same situation. For instance, is it possible to reconcile somehow the commonly clashing demands of honesty and kindness, by being kindly honest or honestly kind, or may a particular situation call for the abandonment of one of the two virtues? Whatever the eventual decision may be, it is the faculty of phronesis that is meant to do the adjudicative heavy lifting. At the same time, it turns simple dispositions to be good along various independent paths into a coherent way of life, a comprehensive moral journey, guided by right reason, albeit a journey that is heavily contextualized and individualized [4].

Phronesis, or practical wisdom, has been undergoing a significant academic revival of late. What we could call the new “phronesis bandwagon” was originally driven by philosophers [5–8], but this interest has also percolated to psychologists [9–12] and theorists working within various branches of professional ethics, such as medicine [13], nursing [14], and business [15]: branches where phronesis-guided virtue ethics is gradually becoming the moral theory of choice. Various interdisciplinary studies are also emerging [4]. Saliently for the present context, a long-standing discourse on general wisdom within psychology has also been swaying significantly in the direction of phronesis—away from both sophia (theoretical wisdom) and deinotes (wisdom understood as mere instrumentalist calculation). Grossmann et al.’s [16] landmark paper and all the responses to it in the same issue seem to be turning the tide in psychological research on wisdom toward a sharper focus on the moral aspirations undergirding wisdom—and hence toward an alignment with the neo-Aristotelian phronesis tradition.

Philosophy has a reputation for being a cloistered discipline with little interest in collaboration with social science. Neo-Aristotelian virtue ethics constitutes an exception because of its naturalistic methodology, according to which all ethical theorizing must be informed by empirical evidence [7]. On the other hand, psychology has also arguably been too cloistered as a discipline in the field of wisdom research, where comprehensive overviews continue to be written [17, 18] without taking account of philosophical work on practical wisdom or, more generally, the collective historical tradition within fields such as philosophy and theology of trying to make sense of what wisdom is. Despite the recent interest within psychology [16] in an alignment between philosophical and psychological accounts of wisdom, prior to work in the present research center, no psychologically credible conceptualization of phronesis existed and no instrument to measure it [9, 19]. A core assumption of the current research study is that philosophers and psychologists studying wisdom are ultimately interested in more or less the same thing, and that crossover work between them can be fruitful. We return to that assumption in the General Discussion.

Some historical and current competitors

The perennial importance of excellence in moral decision-making explains why phronesis attracted attention in early moral theorizing. However, this interest gradually faded in Enlightenment and post-Enlightenment discourses, with phronesis being brushed off as too indeterminate as a decision procedure, and indeed as part of a naïve “bag-of-virtues” conception of moral life, according to Kohlberg [20]. The decline of interest in phronesis developed in tandem with the erosion of virtue ethics as a paradigm in moral philosophy [21] and the replacement of “character” with a conception of human “personality” as “character devaluated” in psychology [22]. Phronesis, with its emphasis on making wise moral decisions based on the specifics of a context (situational and individual), cannot be formulated via the kind of algorithmic principle-based decision-making typically favored by Enlightenment and post-Enlightenment thought.

As a decision process, phronesis thus became replaced by top-down procedures that better fit the post-Enlightenment frame of mind. There are many historical influences cast in the mold of that thought, including an instrumentalist cost-benefit analysis of the utilitarian kind [23, 24], a formalistic deontological (rule-based) procedure emphasizing purely rational arbitration of decision-making [20, 25], a sentimentalist philosophy that views desires and emotions (not reason) as the essential sources of decision-making [26], and a logical positivist philosophy of science that eschews values and ethics in science (critiqued by Richardson et al. [27]). These influences on psychology were all unfriendly to the concept of phronesis. In the flow and ebb of intellectual opinion, these post-Enlightenment positions have come under heavy criticism for their uncritical bifurcation of facts and values [28–30]. At the same time, the rationalistic approach in early moral psychology [20] also suffered a setback when it transpired that correlations between developmental stages of moral reasoning and actual moral action were low [2, 31].

While neo-Kohlbergians and other psychologists have come up with complex models to show how various psychological functions might combine to aid moral decision-making [31, 32], the most heavily cited accounts of the moral life in current psychological circles are surprisingly reticent on the problem of virtue conflicts within an individual’s moral make-up. The Values-in-Action taxonomy of 24 character strengths and virtues favored by positive psychologists [33] makes do without any integrative meta-virtue. When pressed, defenders of this model argue that three of the 24 strengths listed—namely prudence, perspective, and judgment—combine to fulfill the required synthesizing role [34], but that begs the question of who or what calls the shots if these three collide. This problem is compounded by the lack of any golden-mean architectonic in the positive psychological system, assuming rather that the more of each virtue is better [35]. Moral Foundations Theory has provided valuable insights into the way in which liberals and conservatives prioritize different values and virtues, with the former foregrounding care and fairness (understood as equality), but the latter fairness (understood as proportionality), loyalty, authority, and purity [36]. However, the theory has so far provided little insight into how, say, conservatives adjudicate conflicts between proportionality and loyalty.

Some psychologists critique the phronesis construct as developmentally and educationally underdeveloped and propose, rather, reliance on better entrenched developmental constructs such as that of metacognition [37]. The recent buzz about phronesis in philosophical circles notwithstanding [38], some philosophers have also toyed with the idea that phronesis may be a redundant concept [38, 39], while others argue that standard Aristotelian approaches do not give phronesis sufficient priority with respect to the moral virtues (as traits), and that the only trait a truly virtuous moral agent needs to possess is simply phronesis. More specifically, on this view (the so-called Aretai Center Model) all ethical virtues are ultimately unified within phronesis itself, understood as overall moral expertise [40].

We return to some of those alternative conceptions, in light of our own findings, in the General Discussion. At the present juncture it is worth pointing out, however, that some of the most recent evidence from psychological studies casts doubt on a redundancy thesis about a synthesizing virtue such as phronesis. For example, Feraco et al. [41] found that the VIA character strengths aggregated into a single character factor using a bifactor model, with several specific character strengths not predicting life satisfaction and mental health over and above the general factor. The authors also emphasize the need to better understand what that general factor represents and how these character strengths are integrated to affect external outcomes. Other researchers such as Han [42] have examined the network structure among moral functioning components and concluded that functional connectivity between components may be important, as these are, for instance, tied to civic engagement levels. Finally, Han [43] examined whether neuroscientific evidence can support standard models of phronesis, focusing on network-based moral functioning at the neural level and its implications for phronesis eliminativism. He proposed that while evidence supports the standard phronesis models, future studies should more directly target phronesis as a construct, acknowledging the limitations of current neuroscientific approaches in fully capturing its multifaceted nature. Overall, there is reason to suppose that a standard model of phronesis may indeed be substantiated psychologically; yet this requires an integrative approach that encompasses both structural psychometric and network approaches to better comprehend its complex and multidimensional character.

Introducing and unpacking the Aristotelian “standard model” of phronesis.

Given the historical role of Aristotle as the forebear of all contemporary philosophical and psychological work on (practical) wisdom, we believe his theory deserves serious scientific exploration; hence the original question in the title of this article. In this section, we explore the theory in more detail, before later subjecting it to a comprehensive empirical investigation.

In Aristotle’s [3] ethical system, phronesis is nothing less than the lynchpin of a flourishing life, actualizing the virtues and representing good character. According to Aristotelian character developmental theory, young people who have acquired the right moral habits through good upbringing need to gradually develop this intellectual virtue to guide their decision-making. Otherwise, their moral lives will be fragmented, uncritical, and lacking in intrinsic value. Rejecting Plato’s idea of a moral master virtue (justice), which trumps the other virtues in times of conflict, Aristotle proposed this intellectual meta-virtue instead. In that sense, then, Aristotelian phronesis is best understood as deliberative excellence in moral decision-making [4].

It is important to note at this point that the foundational concept of Aristotelian and neo-Aristotelian virtue theories (as well as their educational incarnations as “character education”) [7, 44] is not “character” or even “virtue,” but rather “flourishing.” Flourishing is seen as the “ungrounded grounder” of the good life [33], which the other key capacities are either conducive to or constitutive of. Although flourishing (often under the banner of “objective wellbeing”) has been on psychological agendas for decades, its prominence has been on the rise in recent times, as well as efforts to measure it [45, 46]. Although the immediate aim of phronesis, on an Aristotelian understanding, is excellence in moral decision-making, its ultimate aim is not just “prosocial behavior” (an end in which virtue ethicists tend to be only derivatively interested) but rather its constitutive contribution to an overall flourishing life. The overarching research question motivating the present research project, to which the question in the title refers, is whether phronesis predicts the core components of flourishing.

There are already large theoretical literatures on practical wisdom in general [8] and the Aristotelian conception in particular (see [6]). However, most of those literatures are either exegetical or purely philosophical in orientation, and hence outside of our immediate practical and empirical interests. What matters for present purposes is that in philosophy there has gradually evolved what Miller [39] calls a neo-Aristotelian “standard model of phronesis,” which carries independent interest, whatever one may think of some of Aristotle’s own claims. In the standard neo-Aristotelian model, the task of phronesis is complex [47], and a common suggestion from the theoretical literature is that it has at least three functions. First, phronesis helps us spot situations where the relevant virtue is required and how to execute it. For example, courage is the virtue that is appropriate to situations involving risk. Second, phronesis allows us to integrate different virtues that seem to come into conflict in the same situation, such as being courageously generous. This arbitration can also lead to enacting one virtue that is a higher priority and in unresolvable conflict with a second virtue (e.g., mercy versus justice). Only through phronesis do the virtues become a “package deal” ([8], p. 26). Third, the phronimos (person possessing phronesis) re-evaluates and regulates emotional traits acquired early in life, infusing them with reason and justification. Other scholars have added a fourth function, of “deep understanding” [47] of the human condition, to this mix: an understanding of what constitutes human flourishing as an irreducibly moral activity.

Although it has been suggested that Aristotle’s remarks on phronesis are not always particularly illuminating, especially from a contemporary developmental and educational perspective [37], it does seem possible to derive a general account of phronesis from those texts that emphasize its diverse above-mentioned functions—hence refining and concretizing “the standard model.” The best way to convey the nature of those functions in contemporary psychological language is to say that the construct is made up of various (inter-related) components, and we will hence shift to talking about “components” rather than “functions” in what follows. Therefore, we hypothesize, in line with previous theoretical writings [4, 9, 19], that there are four psychometric components of phronesis, reflecting the four functions they are theoretically tasked with carrying out. The four-componential version of the “standard model,” which we call the Aristotelian Phronesis Model (APM), constitutes a hypothesis derived from Aristotelian theory, a hypothesis tested within Study 1. Moreover, the components do not refer to psycho-moral capacities that are completely independent of one another and can be turned “up” or “down” in isolation; rather, they are inter-related as explained below (see further in [4]). The four APM components are outlined below.

Moral perception.

Phronesis in the APM involves the cognitive discriminatory ability to perceive the ethically salient aspects of a situation and to appreciate these as calling for specific kinds of responses [9]. In the phronimoi, this becomes a moral cognitive excellence in that, after having noted a salient moral feature of a concrete situation calling for a response, they will be able to weigh different considerations and see that, say, courage is required when the risk to one’s life is not overwhelming but the object at stake is extremely valuable; or that honesty is required when one has wronged a friend. Herein, we adopted Moral Perception as a more generally understood term. We could also refer to this component as moral sensitivity, a term often found within standard moral psychology/education literatures [31].

Moral emotion.

Individuals foster their emotional wellbeing through phronesis by coordinating their emotional responses with their understandings of the ethically salient aspects of their situation, their judgment, and their recognition of what is at stake [9]. This is partly because they will have developed habituated virtues, meaning, inter alia, that their emotions have been shaped to align with the motivations and behaviors characteristic of a virtuous person. Additionally, these emotional habits are reinforced and solidified through understanding and reasoning, providing a robust intellectual foundation for their responses. For example, a phronimos might recognize that her appraisal of the situation is problematic, giving rise to an emotional response that is inappropriate to the situation. The emotion-regulative component can then help her adjust her emotion by, for instance, giving herself an inner “talking to” or asking herself questions about what is prompting the ill-fitting emotional response. For this reason, we can also refer to this component, in a more standard Aristotelian way, as infusing emotion with reason [48]. The term we will use hereinafter is Moral Emotion, encapsulating the emotional regulative function emphasized by philosophers, and the positive and negative emotions associated with moral actions.

Moral identity.

The synthesizing work of phronesis operates in conjunction with the agent’s overall understanding of the kinds of things that matter for a flourishing life: a person’s own ethical aims and aspirations, her understanding of what it takes to live and act well, and her need to live up to the standards that shape and are shaped by her understanding and experience of what matters. This amounts to what we call a blueprint of flourishing [9]. A “blueprint” has more similarity to what psychologists call “moral identity” than a full-blown theoretical outline of the good life [49, 50], and we use Moral Identity for simplicity in subsequent sections. Phronetic persons possess a general justifiable conception of the good life (eudaimonia) and adjust their overall reactions to that blueprint, thus furnishing it with motivational force. This does not mean that each ordinary person needs to have the same sophisticated comprehension of the “grand end” of human life as a philosopher might have in order to count as possessing phronesis. Rather than being an “elite sport,” the sort of grasp of a blueprint of the aims of human life informing phronesis is within the grasp of the ordinary individual. It draws upon the person’s standpoint on life as a whole and determines the place that different goods occupy in the larger context.

Moral adjudication.

Assume that we have identified a moral problem correctly as one potentially requiring input from two or more apparently conflicting moral virtues. Let us further assume that we have infused our relevant emotions with reason and that they are not obstructing the decision process. Finally, assume that we have a clear, non-self-deceptive identity of who we want to be—a blueprint of the good life—and an overall motivation to bring our reactions into line with that identity. That leaves just the final component of the four-componential construct: the integrative component—what we could also call its adjudicative component [9] or, in line with standard moral psychology, simply denote as a form of “moral reasoning.” Through this component, an individual integrates different virtue-relevant considerations, via a process of checks and balances, especially in circumstances where different ethically salient considerations, or different kinds of virtues or values, appear to be in conflict and agents need to negotiate dilemmatic space. For the sake of simplicity, we will use the term Moral Adjudication, hereinafter.

The APM is built upon the foundational work of the Jubilee Centre Phronesis Report [51] and the influential framework of Darnell et al. [9, 19]. These earlier studies provided a pivotal theoretical base and important preliminary empirical insights into the structure of phronesis. A particular strength of Darnell et al.’s [19] work was its use of a confirmatory factor analysis (CFA) within a structural equation modeling framework to assess the viability of the four-component APM. By drawing on existing validated measures aligned with the theorized components of phronesis, this study offered an innovative first attempt to operationalize the APM and demonstrated that a second-order model, with four interrelated components feeding into an overarching phronesis construct, could be empirically supported.

However, while this approach provided strong theoretical and empirical groundwork, it also presented certain limitations that future research needed to address. First, by relying on existing measures originally developed for other purposes, the study risked a degree of conceptual misalignment between the measures and the specific nuances of the APM components. Second, the use of CFA, while appropriate for testing a pre-theorized model, meant that the data had less opportunity to inform the factor structure inductively. Third, the resulting measure took approximately 45 minutes to complete (hereinafter referred to as the Long Phronesis Measure, or LPM), raising concerns about its practical feasibility for large-scale studies or applied settings. Fourth, the participants were not representative of the general population, so the data could not speak to phronesis in general. Finally, the item set and scoring instructions were not made fully accessible, which limited the replicability of the findings and constrained the development of a broader empirical program of research on phronesis.

These strengths and limitations highlight both the significance of the Darnell et al. [19] study and the need for further work to build upon its contributions. Future research required measures that were specifically tailored to reflect the APM framework, an approach that would balance theoretical precision with practical utility. Additionally, an inductive exploration of the dimensions of phronesis—through methods such as exploratory factor analysis (EFA)—would allow the data itself to reveal the structure of the construct, providing a stronger empirical foundation for subsequent confirmatory analyses. Addressing these issues is essential to advance both the theoretical understanding of phronesis and its measurement in a way that supports scalable, replicable, and substantive empirical research.

The current study

Our study traverses two culturally distinct countries—the United States and the United Kingdom—by drawing on representative adult samples. This breadth of inquiry allows us to move beyond philosophical conceptions of phronesis towards an empirical psychological account representative of these general populations. We are also interested in the extent to which this inquiry enables us to respond to the sort of eliminativism about phronesis mentioned earlier [37, 39] as well as to alternative non-Aristotelian conceptions [40]. We return to those questions in the General Discussion.

In addition to examining the internal structure of phronesis, our research delves into its intersections with core psychological constructs, including personality traits and so-called Moral Foundations. This aspect of the study enriches our understanding of how phronesis interacts with fundamental human attributes and moral reasoning processes. The research questions driving this work are broadly designed to develop and assess a new measure of the APM, with the subsequent goal of understanding the workings and predictive power of Aristotelian phronesis as such. This overall focus can be subdivided into a measurement development phase, in which we test whether the four theorized functions of phronesis within the APM emerge from a bottom-up analysis, followed by studies assessing the validity of the measure in relation to a wide range of psychologically and sociologically significant variables. Study 1 addresses the measurement properties of the new instrument using a mix of exploratory (Study 1a) and confirmatory approaches (Study 1b) in a sample collected in the UK and in the USA. While it is also important to research practical wisdom in non-WEIRD contexts [52], establishing a general understanding of phronesis within Western culture, from which the concept originally emerged, was seen as an appropriate first step for a substantive research program. This phase of the paper concludes with Study 1c where information is presented about the test-retest reliability of the instrument. In Study 2, we explore phronesis and its relationship to flourishing from a network psychometrics perspective, in an attempt to determine which aspects of a network of phronesis sub-factors are most central. This was a necessary study to help understand how the phronesis components identified in Study 1 interrelate. Finally, we explore the association between phronesis network components and a wide range of variables of psychological and sociological interest (Study 3). In particular, we wanted to understand whether phronesis would predict variables of interest (flourishing, morally salient aspects of personality, and moral engagement) over and above Moral Foundations, thus testing whether this construct adds explanatory power over and above one of psychology’s premier moral theories. Taken together, the studies described in this article were designed to create an ambitious, wide-ranging, and well-validated measure of phronesis that would satisfy requirements made by psychologists, practically minded philosophers, and educators. More broadly, we attempted to shed a greater degree of empirical light on an important and ancient philosophical concept than has previously been achieved.

All analyses and codes are available at https://tinyurl.com/spmdevelopment for researchers to check our work for studies within this paper. This data may also be reused with the authors’ prior permission. As the measure of phronesis developed for this study was initially based on exploratory analyses, analyses reported herein were not pre-registered.

Study 1: Establishing the dimensions of phronesis, and developing the short phronesis measure

Study 1 Purpose and aims

Aristotle’s phronesis, or practical wisdom, has stood as a guiding concept for understanding moral reasoning and human flourishing for over 2,300 years. While rich in theoretical depth, philosophical constructs like phronesis often rest on untested assumptions about human behaviour. Study 1a–c sought to bridge this gap by developing the Short Phronesis Measure (SPM), using a systematic process to derive and subsequently test the structure of phronesis empirically. These studies are presented sequentially below, reflecting the order in which they were conducted, with the findings of each study building on the previous.

Study 1a was about impartially deriving the structure of Phronesis from the data. Guided by Aristotelian theory, the item development process ensured that the SPM was tethered to the underlying theory. However, EFA took an impartial look at the data to inductively uncover latent dimensions of phronesis, rather than imposing these dimensions in a top-down manner based on theory alone. This approach allowed for the possibility that the data might reveal nuances or even divergences from the Aristotelian framework, ensuring that the resulting measure would reconcile this long-standing theory with empirical reality, availing of modern social science methods.
Study 1b used CFA to evaluate whether the inductively derived structure identified in Study 1a holds up across independent UK and US samples. This phase provides a critical deductive check on the validity of the model, assessing its robustness under stricter conditions. Additionally, measurement invariance testing examined whether the dimensions of phronesis were consistent across cultural contexts, ensuring the measure’s generalisability.
Study 1c evaluated the temporal stability of phronesis. By conducting test-retest reliability analyses, this phase addressed whether phronesis behaves as a stable aspect of one’s character as Aristotle might have predicted, or whether it fluctuates as a transient state. This insight is critical for determining the SPM’s utility in intervention studies and longitudinal research. Temporal stability ensures that the measure can reliably capture enduring characteristics of practical wisdom over time.

Together, these studies provide a rigorous framework for operationalizing phronesis. Study 1a derives a theoretically informed yet empirically grounded structure, Study 1b tests the robustness and universality of this structure, and Study 1c ensures the measure’s stability over time. This stepwise process integrates philosophical insight with empirical testing, transforming phronesis from a conceptual ideal into a measurable construct that can inform research and practice.

Study 1 Method

Participants.

Participants for Studies 1a-1c were recruited online using Prolific Academic. Studies have shown that Prolific provides high quality data with high levels of participant comprehension and attention, and low levels of dishonesty [53]. Participants in Studies 1a and 1b were compensated for their time at a rate of £9/hr for one hour of their time, while Study 1c participants were compensated at the same rate for 20 minutes of their time. All data for this project (including subsequent studies) were collected between August 2023 and February 2024.

For Study 1a, after excluding those who elected to withdraw, did not have a valid Prolific ID, or had less than 95% survey completion, the final sample comprised 1998 participants. The sample was representative based on age, sex, and ethnicity, reflecting UK census data (see Table 1). For Study 1b, we recruited two new representative samples, one from the UK (see Table 2 for demographics) and a second from the US (see Table 3). The UK sample included 1,000 participants from Prolific Academic, with 997 retained after removing participants who wished to withdraw. Likewise, we recruited 1000 participants from the US, with 988 retained after removing participants who wished to withdraw. For Study 1c, we recruited 300 participants. Participants were only eligible to take part if they were already part of Study 1b’s UK-representative sample. We gained Study 1c participants’ permission to link their Study 1b data with their Study 1c responses, after which point their responses would be fully anonymized. After removing participants with incomplete data, the final number of participants for Study 1c was 295. The mean age of participants recruited was 45.73 (SD = 13.55), with 137 females, 156 males, and two identifying as “other” or “prefer not to say”. For ethnicity, 254 participants identified as “White”, 14 as “Black”, 11 as “Central Asian”, eight as “East Asian”, seven as “Mixed/Multiple ethnic groups”, and one as “Prefer not to say”.

Download:

Table 1. Demographics of the exploratory UK sample compared to demographic data from the Office for National Statistics (ONS) to evidence representativeness.

https://doi.org/10.1371/journal.pone.0317842.t001

Download:

Table 2. Demographics of the confirmatory UK sample compared to demographic data from the Office for National Statistics (ONS) to evidence representativeness.

https://doi.org/10.1371/journal.pone.0317842.t002

Download:

Table 3. Demographics of the confirmatory US sample compared to demographic data from the US Census Bureau’s American Community Survey 2022 to evidence representativeness.

https://doi.org/10.1371/journal.pone.0317842.t003

Measures and item development.

For the SPM, we designed a questionnaire comprising 189 items that aimed to capture various theorized components of practical wisdom. These components were identified based on an extensive review of the literature, some of which has been summarized above. Following this, the first author (a psychologist experienced in psychometric measure development) manually generated an array of possible item formats and example items. These examples, along with the definitions of the constructs they purported to measure, were entered into a Large Language Model (LLM), Open AI’s Chat GPT 4.0, where the first author generated a wide array of items of the same types. The first author suggested refinements to item exemplars where he deemed necessary (e.g., if the item array did not appear to be diverse enough to capture how someone generally behaves, across contexts). Once this process was complete, the item list was shared with the rest of the research team (the second author is an Emeritus Professor of Moral Psychology with expertise in measurement, and the third author is a Professor of Philosophy and originally identified the four theorized components of phronesis we proposed to study) for critique and feedback. If items were identified as problematic, the first author manually created refined versions of these items before generating new item variations. This process was repeated until consensus was reached that we had a list of items that (i) reflected the theorized components of phronesis under study, (ii) were diverse in that they could, in principle, capture how someone generally behaves, and (iii) sufficiently succinct to allow for rapid test administration. The initial questionnaire was designed to be administered online and took approximately 30–40 minutes to complete.

In the item development process, we recognized that some aspects of phronesis could be objectively tested. For example, the ability to discern the moral relevance of a given scenario can have a defined “correct” answer. On the other hand, certain facets of the construct are inherently subjective and best captured through self-report measures. An example of this would be the experience of moral emotions, which is intrinsically subjective and thus most appropriately assessed through self-reporting. A detailed overview of these items is provided in the subsequent sections. Before proceeding further, however, it must be acknowledged that creating items that home in on all the four components (and possible sub-components) of a phronesis construct is a herculean task, not least because of the lack of precedents, apart from our own previous work [9, 18]. The overarching question is always, first, whether we have identified enough sub-functions to be sufficiently reflective of each broader function of the relevant component; and second, whether we have created enough distinct observable variables (survey items) to be sufficiently reflective of those sub-functions. Working on the assumption that “perfect is the enemy of the good,” we embarked on this task seeking to strike a balance between achieving wide construct coverage and having a measure that could be completed relatively quickly and easily.

Moral perception. The ability to accurately perceive morally relevant elements within various scenarios is not a matter of subjective interpretation. Therefore, we incorporated two distinct types of objective assessments to evaluate moral perception. The first test focused on the recognition of whether a situation itself holds any moral relevance. The second test aims to identify the specific virtues that are at stake within those morally relevant situations. These objective tests serve as an alternative to self-report measures for this particular aspect of the broader phronesis construct. The items included under this Moral Perception category were chosen based on the same principles as those in the earlier Phronesis Project [19, 51]. However, the items in this study were designed to be more straightforward to score, as they did not include any open-ended questions. It should be noticed here that the common core in both tests is an assessment of how well the individual homes in on the moral dimension within a complex social situation. While this is not the same as identifying what would be relevant to a fully phronetic person, in order to develop the sensitivity of a phronimos, the moral learner needs to be ablein the first place—to spot and categorize situations of potential moral and characterological relevance. So, the importance here lies in what the respondents identify in the situation, not their assessment of what a phronimos would identify; hence the use of the first-person rather than the third-person within these item sets. Drawing on ideas from Bebeau and colleagues [55], our items focus on identifying and labelling “the moral” and being able to discount other issues. Our items economically attend to the central features of previous moral perception measures, and within our model the component they illuminate behaves much like other treatments of the construct [91].

We initially formulated 20 items aimed at assessing whether participants could discern if a decision in a given scenario would have implications for their character. Participants were guided by the following instruction: “In the upcoming section, you will encounter various scenarios, each requiring a decision. Your task is to identify which scenarios involve decisions that could influence your character. Please ponder the moral or ethical ramifications these situations could have on the individual involved.” Subsequently, participants were presented with scenarios such as: “You have found out a colleague is claiming your work as their own, but confronting them could create team tension” (character-impacting based on a virtue ethical framework) or “You are an avid reader and must choose the next book to read from a stack of equally enticing options” (not character-impacting based on a virtue ethical framework). Participants had two response options: “What I decide to do in this scenario does not affect my character” and “What I decide to do in this scenario affects my character” (note that “affects” was deliberately chosen over “demonstrates”, as in theory, who we are is an aggregate of what we do, in the first instance). Responses were scored as either correct or incorrect.

The subsequent set of items for assessing moral perception consisted of 15 moral dilemmas. In these dilemmas, participants were tasked with correctly identifying the virtues implicated in each scenario. The instruction provided was: “In this section, you will encounter various scenarios along with four character traits that may or may not be pertinent to the situation. Your responsibility is to select the two traits you believe are most relevant.” For instance, participants were presented with a scenario like: “You discover a wallet on the ground containing a substantial amount of money and the owner’s identification. You must decide what action to take. Which of the following traits are most relevant to your decision? (Select two answers from the following options).” Listed below the scenario were two virtues relevant from a virtue ethical standpoint (e.g., honesty and practicality) and two less relevant virtues (e.g., humor and resilience). Participants could select two answers, resulting in scores ranging from 0–2 for each question. We relied on a broad set of virtues (qua positive traits of character) here that could be deemed characterologically relevant, rather than a more narrowly construed set of standard moral virtues.

Regarding the second set of questions, it could be argued that focusing on situations eliciting more than one virtue (1) illicitly limits moral perception to the identification of dilemma-like situations, eschewing single-virtue-eliciting situations, and (2) conflates moral perception with moral adjudication (see below). However, (1) the first set of items homes in on characterologically relevant situations involving (potentially) a single virtue, and (2) the task here is not to adjudicate between or integrate virtues but simply to identify which virtues are at stake in the situation [54].

Moral identity. The issue of whether a participant possesses a conceptual framework for their ideal moral self is best evaluated subjectively. Consequently, for this component, we employed a conventional self-report survey methodology. Participants were presented with statements pertinent to their moral identity. In this case, there was no shortage of previous measures of moral identity, although those have not been related specifically to phronesis, and we drew upon many of those for enlightenment [55]. The items generated aimed to emphasize the relationship between moral identity and moral decisions more concretely than found in other popular measures. For example, one of our items was “When faced with challenging situations I ask myself what a good person would do”. In contrast, popular measures like the Moral Identity Scale [56] include items that are less focused on decision making, such as “It would make me feel good to be a person who has these characteristics”. To be sure, the fact that a decision will (or will not) affect a person’s character is only one way that something could be morally relevant. An alternative approach would be to provide a scenario in which respondents must decide what to do both before and after some event (either morally relevant or not relevant) is introduced. However, as we are interested in moral identity and moral decision-making on a virtue ethical understanding—but not for example a utilitarian one of good moral identity simply as a generator of “prosocial” responses—we opted for the former understanding. Responses were collected using a five-point Likert scale, ranging from “Strongly Disagree” to “Strongly Agree.” Given that we used a single item format for this hypothesized component of phronesis, we initially generated an extensive set of items, totaling 25. These items were modeled after established moral identity questionnaires, maintaining consistency with the format used in previous research, including the earlier Phronesis Project.

Moral emotion. Moral emotion is also subjective, making self-report measures a suitable method for capturing this component. Additionally, moral emotion is theoretically multi-faceted, encompassing the emotions one experiences when acting morally or immorally, as well as one’s ability to regulate these emotions. Therefore, we incorporated multiple item formats to adequately assess this aspect of phronesis. In this way, we improved upon previous measures of moral emotion in the context of phronesis measurement, which focused more so on empathy and perspective taking [57].

Initially, we developed items to gauge the emotional responses individuals might experience when acting either morally or immorally in various scenarios. Participants received the following instruction: “Below are different scenarios where you decide to take specific actions. If you were to take these actions, how would you feel about yourself?” This was followed by a set of 20 statements, half of which depicted morally upright actions (e.g., “A stranger drops a £100 note without noticing. You pick it up and return it to them”), and the other half illustrated moral failings (e.g., “You exaggerate an issue at work to harm a colleague’s professional reputation”). Participants indicated their emotional response on a five-point Likert scale, ranging from “Extremely Bad” to “Extremely Good.”

While emotional regulation has both subjective (e.g., personal emotional experience) and objective (e.g., observable externalizing behavior) elements, we opted for self-report measures. This decision was made because it would be neither practical nor ethical to objectively assess emotional regulation in stress-inducing situations. Participants were guided by the following instruction: “Below, you will encounter various scenarios that may elicit an emotional response. For each situation, please reflect on how you would typically react and rate your ability to manage your emotions. By ‘manage,’ we mean your capacity to prevent your emotions from overwhelming you and to maintain your composure.” Participants then responded to 20 scenarios, some involving moral transgressions against them (e.g., “A stranger is rude to you in a public place for no apparent reason”) and others involving everyday frustrations (e.g., “You accidentally spill a drink on your clothes just as you are about to leave the house”). The emotional responses that might be regulated in these scenarios ranged from jealousy/envy (e.g., scenarios involving recognition given to someone else for your achievement.), irritation/annoyance (e.g., scenarios involving minor inconveniences, such as someone pushing in a queue or a neighbor playing loud music), anxiety (e.g., waiting for a bus that’s late or facing a cancelled flight), hurt/resentment (e.g., harsh criticism for a small mistake or a co-worker making a joke at your expense). In these scenarios, we avoided specifying a “correct” emotional response, as we were not concerned here with adjudicative capacities, and instead aimed to draw on participants’ self-knowledge about their ability not to make an obviously “incorrect” decision based on emotional impulse. Responses were collected on a five-point Likert scale, ranging from “Very Poor” to “Very Good.” We contend that the items selected under “Emotional Regulation” offer a more nuanced reflection of the construct in question compared to the generic measures of empathy used in the previous Phronesis Project [19, 51]. Specifically, these items target the regulatory aspect of phronesis more directly, rather than focusing solely on the general capacity to experience emotions.

Moral adjudication. Moral adjudication entails the integrative process of arriving at a “correct” moral decision through thoughtful deliberation. This concept comprises two elements: firstly, the selection of the correct moral choice based on some criterion, and secondly, the methodology employed in making that choice.

For assessing “correct” moral choices, participants were instructed as follows: “In the upcoming section, you will encounter a series of scenarios. Each scenario presents two primary considerations that represent different potential responses. Your task is to decide how you would balance these considerations if you were to act in these scenarios. You will have seven boxes to choose from. The first and seventh boxes contain the two main considerations. Selecting either of these boxes indicates that you would focus solely on that consideration, disregarding the other. The intermediate boxes signify varying degrees of balance between the two considerations. Although real-life situations may involve more than two considerations, for the purpose of this exercise, please make your decision based on the two presented.” Participants then responded to 22 items, each accompanied by a 7-point scale. Scoring was conducted in three distinct ways: The first scoring method considered “4” as the “correct” answer, representing a balanced consideration of self and others, in alignment with virtue ethical assumptions. The second method compared the average flourishing scores for each scale point. The point with the highest average was deemed “correct,” and participants were scored based on their distance from this point. This approach allowed for situational variability and was grounded in Aristotelian virtue ethics. The third method simply scored answers on a 1–7 scale, with higher scores representing greater prosociality.

The most practical way to assess moral deliberation was to ask participants to self-report on how they gather and verify information relevant to both general and moral decision-making (e.g., “I evaluate the reliability and credibility of the information sources before making a judgment”). They responded to 23 statements on a five-point Likert scale, ranging from “Strongly Disagree” to “Strongly Agree.”

To assess the extent to which people make an effort to integrate different phronesis components in decision-making, participants were presented with 17 statements and asked to agree or disagree on a five-point Likert scale (Strongly disagree–Strongly agree). For example, the statement “In making a decision, I consider my thoughts, feelings, and the situation at hand” gauges the application of moral emotions in a context-sensitive manner.

The final way we attempted to measure moral adjudication was with a dilemma taken from the Adolescent Intermediate Concept Measure (AD-ICM). The AD-ICM assesses the moral thought processes of young people, in particular, focusing on their transition from self-centered to conventional thinking from a neo-Kohlbergian perspective [58]. Participants use a 5-point scale to rate the morality of potential actions and underlying reasons for those actions that a main character in a story in which a moral dilemma is presented could make. Scoring is based on how closely participants’ choices align with expert judgments. Although the AD-ICM was included as it was part of our previous Long Phronesis Measure (LPM, see [19]; note that the LPM included two moral dilemmas, rather than one), it was simplified for scoring purposes through a Microsoft Excel file with embedded macros developed by the second author, in line with our aim to produce a practical and easy-to-score measure.

Flourishing. The Wellbeing Assessment (WBA; [59]) is an instrument designed to gauge comprehensive wellbeing. It is rooted in a theoretical framework that views human flourishing as a state where all aspects of life are positive. This conceptualization aligns with the World Health Organization’s holistic definition of health, which includes mental, physical, and overall wellbeing, and is also reasonably close to a neo-Aristotelian conception of flourishing [51]. The WBA assesses six domains: emotional health, physical health, meaning and purpose, character strengths, social bonds, and financial stability. Although self-reported, the WBA’s sub-scales have demonstrated predictive validity for more objective criteria. For example, the “physical health” self-report subscale has been shown to predict medical diagnoses and insurance claim data during the validation study [59]. In accordance with neo-Aristotelian virtue theory, we deemed it essential to link the adjudicative function of phronesis directly to flourishing and to explore participants’ deliberative strategies. The items selected under “Moral Adjudication” were more diverse than those in the earlier Phronesis Project, which solely used the AD-ICM.

Procedure.

Data collection for Study 1a involved all 189 items generated for the initial practical wisdom measure’s dimension reduction phase. For Studies 1b and 1c, participants completed only the abbreviated and final 107-item practical wisdom measure following dimension reduction in Study 1a. All participants in all studies completed the WBA. Study 1b participants completed the additional measures for purposes of establishing criterion validity that were not used; this is detailed in Study 3.

Data collection was entirely conducted online. Participants filled out the relevant array of questionnaires for each study within a single session, lasting up to one hour for Studies 1a (large practical wisdom item array and the WBA) and 1b (final practical wisdom measure, the WBA, and criterion validity measures), and 20 minutes for Study 1c (reduced practical wisdom measure and the WBA only). The sequence of the questionnaires was consistent for all participants. However, within each questionnaire, the questions were presented in a randomized order, with the exception of the AD-ICM items in Study 1a, which necessitated sequential completion. Prior to beginning the questionnaires for each study, informed consent was secured from all participants.

Ethics.

This study and other studies hereinafter received ethical approval by the University of Birmingham’s institutional review board in June 2023 (ERN_1043-Jun2023), with a subsequent amendment approved in December 2023 (ERN_1043-Dec2023). Informed consent was sought for all participants for this and subsequent studies: Participants were presented with an information page, and electronically signed a consent page if they wished to proceed with the study. No participant deception was involved. All surveys were completed online, with participants opting in of their own volition after seeing the study advertisement.

Analytical strategy.

Study 1a. We initiated our analysis with EFA to empirically determine the number of factors represented by the 189 items, without imposing our preconceived theoretical structure (APM) on the data. To do this, we employed the psych [60] and gparotation [61] packages within R Studio [62]. Carpenter’s [63] recommendation for the participant:item ratio is 10:1, which we exceeded. The Kaiser-Meyer-Olkin (KMO) measure stood at .94, confirming adequate sampling for the EFA. Additionally, Bartlett’s test was conducted, and a clause was incorporated into the code to halt the EFA if the test value was less than or equal to .05. Depending on the data’s skewness and kurtosis, we employed either Principal Axis Factoring (if skewness was > 2 or kurtosis was > 7) or Maximum Likelihood as the extraction method. The number of factors to retain was ascertained through parallel analysis. Finally, the issue of whether the factors should be correlated was not immediately evident to us. This uncertainty was particularly relevant given the likelihood of a multi-factor solution, a scenario amplified by the large sample size and extensive item pool. To address this, we employed the Promax rotation method. Promax initiates with an orthogonal Varimax rotation, followed by an axis “tilting” to permit obliqueness, rather than pursuing an oblique solution directly as an Oblimin rotation would do. The computational simplicity and speed of Promax make it particularly advantageous when managing a large number of factors. Loadings below .4 for each of these factors were suppressed.

Study 1b. Items that were kept from Study 1a underwent a CFA using the lavaan [64] R package to test the anticipated factor structure within a different yet analogous sample. We used a Diagonally-Weighted Least Squares estimator as our items included non-normal ordinal responses, and robust standard errors to provide a more accurate estimation under these conditions. Consistent with the previous analysis, this CFA was adequately powered, maintaining more than ten participants for each item in the model. Aside from this, we conducted measurement invariance tests in R to understand whether configural, metric, and scalar invariance would be found across the UK vs US data.

Study 1c. We sought to establish test-retest reliability by correlating their scores on the SPM at the time of Study 1c with their scores approximately two months later. This would help us to understand whether scores were stable over time.

Study 1 Results

Study 1a: Dimension reduction using EFA.

Overall, parallel analysis led to the extraction of 17 factors (RMSEA = .02, TLI = .90; χ²[10716] = 107.16, p < .001). Among these, the initial 14 factors possessed enough items with loadings exceeding .4 that would also allow for the computation of an internal reliability coefficient (see Fig 1). A subset of ten factors demonstrated an acceptable level of internal reliability, with Cronbach’s alpha values surpassing .7. Consequently, these ten factors (107 items, taking 15–20 minutes to complete) were retained for further analyses without further modification (see Table 4). These ten factors explained 30% of the total variance across the total number of items. We hypothesized that these ten factors would load onto the four theoretically derived functions of phronesis as superordinate factors, subject to confirmation in CFA.

Download:

Fig 1. Scree plot for the parallel analysis.

https://doi.org/10.1371/journal.pone.0317842.g001

Download:

Table 4. Eigenvalues, internal reliability, and hypothesized function of the ten-factor phronesis model.

https://doi.org/10.1371/journal.pone.0317842.t004

In naming these ten factors, some item sets contained a mix of general factor items and items that are more directly morally salient. For example, one factor contained items like “I try to consider how my decisions today will reflect on the person I aspire to be in the future”, not specifically mentioning moral aspirations. However, other items in that item set such as “I set personal targets that involve improving my ethical awareness and character” were more directly morally relevant. The fact that all aspirations entail values and conceptions of the good notwithstanding, we named factors based on the aggregate meaning of the items constitutive of that factor in the context of the overall measure (e.g., in the case of the two items mentioned above, these were both aspects of “Aspired Moral Identity”). Later in the measure development process, tests of criterion validity could then be used to further establish whether each factor predicts what it should, given how we have named it (see Study 3).

Study 1b: Testing the identified dimensions in two new samples using CFA.

Sample 1 (UK). Each of the ten factors were once again found to be internally reliable, with Cronbach’s Alphas ranging from .72 to .92. We then used a CFA to assess whether the data would fit the hypothesized ten-factor model. The model converged successfully and we observed excellent model fit (CFI = .97, TLI = .97; RMSEA = .04, SRMR = .06; χ²[5414] = 13178.35, p < .001). Next, we used a bifactor structural equation model (SEM) to test whether the ten empirically identified factors could be considered sub-factors of the four theoretical components (Moral Identity, Moral Emotion, Moral Perception, and Moral Adjudication) from which the items were originally derived, and in turn, whether those components loaded onto a superordinate Phronesis factor (in what follows, Phronesis is capitalized when it refers specifically to the factor in question rather than to the general philosophical concept). The model did not converge after 2,436 iterations, and so reliable estimates for our hypothesized bifactor model in a UK sample could not be produced. We reran this model, this time allowing the ten factors to covary within the hypothesized four factors (e.g., Moral Deliberation allowed to covary with Moral Integration as they are both theoretically aspects of Moral Adjudication), but this model did not converge after 3,749 iterations. The failure to converge suggested potential model misspecification or complexity that the data could not adequately support, limiting our ability to confirm the hypothesized structure. The factor covariances are presented in Table 5 (upper).

Download:

Table 5. Factor covariance matrices for US and UK samples.

https://doi.org/10.1371/journal.pone.0317842.t005

Sample 2 (US). Consistent with the previous analysis, this CFA was adequately powered, maintaining more than ten participants for each item in the model. We also tested for configural, metric, and scalar invariance. Once again, internal reliability was strong across the ten sub-factors according to Cronbach’s Alpha values which ranged from .72-.92. The model fit for the ten-factor solution in the US sample was also excellent (CFI = .97, TLI = .97; RMSEA = .04, SRMR = .07; χ²[5414] = 15092.52, p < .001). A bifactor SEM was then fitted to the data with the ten factors loading onto the four theorized components of phronesis, and they in turn onto an overarching Phronesis factor. The model did not fully converge. The factor covariances are presented in Table 5 (lower).

Measurement invariance. We tested whether the measurement models differed across the US vs UK samples for our best fitting models (i.e., the ten latent factors derived empirically through EFA). The initial step in measurement invariance testing, configural invariance, establishes that the factor structure (i.e., the number of factors and their pattern of loadings) is consistent across groups. This configural model served as our baseline for comparison. No constraints are applied to factor loadings or intercepts at this stage. The configural model demonstrated acceptable fit across the two samples (CFI = .96, TLI = 0.96, RMSEA = .03, SRMR = .05), indicating that the factor structure was consistent between the US and UK samples. Next, we tested for metric invariance by constraining the factor loadings to be equal across groups. The chi-squared difference test between the configural and metric models was significant, χ²(96) = 589.57, p < .001, suggesting a change in fit. This change implies that some factor loadings may differ across groups. Subsequently, scalar invariance was assessed by additionally constraining the intercepts to be equal across groups. The chi-squared difference test between the metric and scalar models was also significant, χ²(96) = 292.48, p < .001, indicating that intercepts might also vary across groups. Finally, residual invariance was tested by further constraining the residual variances to be equal across groups. The chi-squared difference test between the scalar and residual models was significant, χ²(106) = 330.88, p < .001, suggesting that the residual variances also differ across the US and UK samples.

Study 1c: Test-retest reliability.

Internal reliabilities for the retested sample were acceptable (.75-.93). The model fit for a ten-factor structure was also excellent (CFI = .99, TLI = .99; RMSEA = .01, SRMR = .06; χ²[5414] = 5482.13, p = .255). Test-retest reliability was established by correlating scores on each of the ten phronesis components with scores two months later. The results are presented in Table 6.

Download:

Table 6. Test-retest reliability of the short phronesis measure in a UK-based sample.

https://doi.org/10.1371/journal.pone.0317842.t006

Study 1 Discussion

The development of a measure capable of reliably assessing phronesis, as articulated by the APM, required a methodological approach that balanced respect for theory with empirical openness. While prior research jumped directly to confirmatory models [19, 65], their aim was to test whether the APM’s theoretical assumptions fit observed data. In contrast, our approach leveraged the APM to guide item development, ensuring construct coverage, but allowed EFA to inductively and impartially determine the number and nature of constructs functionally being measured based on participants’ response patterns. This revealed that the APM’s four-factor structure did not emerge from the data as predicted. Instead, ten distinct and internally reliable factors were identified, suggesting a more complex and nuanced conceptualization of phronesis than originally proposed [9, 18]. These findings form the basis of the refined framework we term the neo-APM, representing an empirically grounded yet philosophically coherent model of practical wisdom.

The confirmatory phase was then essential for evaluating whether this new conceptualization held up in independent samples. CFA allowed us to test the generalizability of the neo-APM (the theoretical model measured by the SPM) across different populations, moving beyond the exploratory sample to assess whether the ten-factor structure represented a broader phenomenon. By conducting CFAs in two nationally representative samples—one from the UK and one from the US—we were also able to test the universality of the neo-APM across culturally distinct contexts. The results demonstrated excellent fit in both samples, providing robust evidence that the neo-APM captures something enduring and broadly applicable. In contrast, a bifactor model attempting to incorporate the neo-APM’s ten factors within the theoretical framework from which the items were derived (i.e., the APM) was not supported. Testing the neo-APM (as measured using the SPM) in two different countries also revealed subtle differences: for example, factor covariance levels were generally higher in the US sample than in the UK, suggesting that sociocultural or even sociobiological factors might influence the relationships among constructs. These findings underscore the universality of the neo-APM as a theoretical framework while highlighting the flexibility needed to accommodate cultural particularities.

Study 1c extended this validation by examining the measure’s temporal stability, a crucial step in determining whether phronesis reflects a stable aspect of one’s character or if it fluctuates across time. Test-retest reliability analyses demonstrated moderate stability overall, with higher stability observed in constructs related to general behavioral tendencies, such as Emotional Regulation. In contrast, performance-based constructs like Situational Moral Irrelevance appeared more influenced by transient states (e.g., mood or fatigue). These findings suggest that phronesis exhibits characteristics of a stable character trait but is also responsive to situational factors. In essence, the evidence points to phronesis as both enduring and adaptive, anchored in dispositions while remaining affected by the changing demands of life.

The interrelationships among specific factors also offer valuable insights into the nature of phronesis. For example, Moral Deliberation was strongly correlated with Aspired Moral Identity and Moral Self-Relevance, suggesting that thoughtful moral reasoning is closely tied to a sense of moral aspiration and the integration of morality into one’s self-concept. However, the lack of covariance between Moral Deliberation and Virtue Identification challenges the assumption that recognizing virtues in specific contexts is a prerequisite for engaging in moral reasoning. This finding prompts a re-evaluation of the importance of virtue literacy in moral/character education, at least in adults (such as our current participants). Similarly, the absence of correlation between Emotional Regulation and Negative Moral Emotion raises pointed questions about their integration under the Moral Emotion dimension of the APM. These results highlight the need to move beyond hierarchical assumptions, suggesting instead that phronesis may be better conceptualized as a dynamic network of interdependent capacities. This relational complexity is further explored in Study 2.

The first study (1a-1c) provides a robust foundation for understanding phronesis, advancing both its theoretical conceptualization and empirical measurement. The neo-APM represents a model that is not only grounded in Aristotelian philosophy but also refined through empirical analysis. However, for this framework to be truly meaningful, it is not enough for the measure to identify patterns that emerge consistently across populations. Establishing the relative importance of the neo-APM components (measured via the SPM) as part of an interrelated network of factors is key for educational purposes, as educators may wish to focus on the most central components, while researchers with limited assessment time may wish to only measure the most central components. This is addressed in Study 2. A comprehensive validation must also show that these patterns correspond to what we would expect based on the theory. In other words, the measure should connect with external benchmarks in ways that align with the philosophical and conceptual understanding of phronesis. Study 3 addresses this critical task by examining whether the SPM produces scores that correlate with outcomes and traits Aristotle might have predicted to be linked to practical wisdom—such as flourishing and moral reasoning—as well as related constructs like personality, which, while not conceptualized in Aristotle’s time, are recognized in modern social science as important factors in understanding human behavior. This step tests whether the constructs measured by the SPM behave as predicted when compared to well-established concepts, ensuring that the measure operates in accordance with its theoretical foundations. By addressing these gaps, Study 3 strengthens the neo-APM as a comprehensive framework for understanding phronesis and its broader relevance to human capacities.

Study 2: A nomological network analysis of Aristotelian phronesis

Study 2 Purpose and aims

Given the nuanced findings from Study 1, which highlighted unexpected factor covariation, there emerges a compelling case for reevaluating our approach to understanding the organizational structure of the phronesis components initially derived from the APM. This reconsideration led us to pivot towards a network psychometrics framework for Study 2. Unlike traditional psychometric approaches that prioritize the identification of latent constructs to explain observable phenomena, network psychometrics offers a fundamentally different perspective. Traditional models often grapple with the causal directionality implied by latent constructs, invoking a familiar behaviorist critique: If phronesis is considered the cause of its sub-factors, then what causes phronesis itself? This line of questioning potentially unravels into an infinite regress, where each supposed cause requires another, deeper cause, ad infinitum [66].

Network psychometrics sidesteps this dilemma by positing psychological phenomena as the emergent outcomes of complex interactions among observable variables, analogous to how the brain itself works [42, 43, 67]. In this framework, the focus shifts to the structure of these interactions, with particular attention being paid to the centrality of specific variables within the network [68]. Such a perspective suggests that among the ten empirically derived aspects of phronesis identified in Study 1a, certain factors may play more pivotal roles than others. For example, these more central factors could act as orienting tele within the phronesis network, guiding the system’s overall direction and influence on flourishing. By adopting this nomological network approach in Study 2, we aim to delve deeper into the architecture of phronesis, exploring how its components interrelate and which, if any, among them serve as the linchpins in the overall network. We anticipated that this would shed light on the emergent properties of the phronesis network, offering a fresh and novel lens through which to view its contribution to flourishing. Moreover, in Study 1b, factor covariances were stronger in the US compared to the UK. Therefore, this study also served as an opportunity to test whether overall phronesis network connectivity would be significantly stronger in the US versus the UK.

Study 2 Method

Participants and design.

In this exploratory study, we aimed to test the network structure of Phronesis (i) in general (USA and UK combined), and (ii) exploring US versus UK differences, using cross-sectional data from Study 1b.

Analysis plan.

We employed network analysis [69] for this study. The analysis hinged on constructing and examining correlation-based networks using qgraph [70] and igraph [71] in R. This approach allowed us to map out how various moral and psychological components, such as Moral Deliberation, Identity Aspirations, and Emotional Regulation, are interconnected. For each analysis we computed key centrality measures: Betweenness, Closeness, Strength (or Degree), and Expected Influence. These are defined as follows:

Betweenness. This is a measure of centrality in a network, reflecting the extent to which a node (i.e., one of the ten phronesis factors, in this case) lies on the shortest path between other nodes. Nodes with high betweenness centrality can be seen as important intermediaries or “bridges” in the network, facilitating communication or interaction between different parts of the network.

Closeness. Closeness centrality measures how close a node is to all other nodes in the network, based on the shortest paths that connect them. A node with high closeness centrality can quickly interact with all others, indicating it has a central position in the network’s overall structure.

Strength. In the context of weighted networks, the strength of a node is the sum of the weights of the edges connected to it. For psychological networks such as these, this can be interpreted as the overall level of direct influence a node has within the network, taking into account the strength of its connections to other nodes.

Expected influence. Expected influence is a measure adapted from strength for networks that include both positive and negative edge weights. It sums up the edge weights while considering the sign of each edge, providing a measure of a node’s total positive or negative influence on the network. A high positive expected influence indicates a node’s strong positive impact on connected nodes, whereas a high negative value suggests a strong negative impact.

All network analyses used the Extended Bayesian Information Criterion (EBIC) using the glasso [72] and bootnet [73] packages to penalize for model complexity. The EBIC helps in selecting a model that is parsimonious yet sufficiently explanatory, thereby preventing overfitting and ensuring the model’s generalizability. This EBICglasso method performs thresholding by penalizing small partial correlations towards zero. This results in a network where only statistically robust and meaningful connections are retained.

Study 2 Results

Overall.

In Table 7, we present the measures of centrality for each component within the phronesis network, with the network’s overall structure depicted in Fig 2. These centrality statistics help us understand the relative importance or influence of each component within the network. For instance, Betweenness highlights components that act as central connectors or bridges. Moral Self-Relevance, with a high Betweenness score of 26, serves as a key link between otherwise disconnected parts of the network, whereas components like Emotional Regulation and Virtue Identification, both with Betweenness scores of 0, indicate a more isolated position. Closeness reveals how central a component is by measuring its proximity to others in the network. In this context, Moral Self-Relevance (Closeness: 0.0137) is more centrally located compared to Virtue Identification (Closeness: 0.0067), meaning it has a broader reach within the network. Strength reflects the weight of a component’s direct connections. For example, Moral Self-Relevance has the highest Strength score (1.212), indicating strong and numerous connections, while Emotional Regulation and Virtue Identification have weaker connections (Strength: 0.351). Finally, Expected Influence considers the potential impact of a component, accounting for both the strength and direction of its connections. Identity Aspirations and Moral Self-Relevance (Expected Influence: 0.929 and 0.928, respectively) have significant potential to shape the network, whereas Emotional Regulation (Expected Influence: 0.138) plays a more limited role. The range of these statistics—from highly positive (e.g., 1.212 in Moral Self-Relevance’s Strength) to positive but less influential (e.g., 0.138 in Emotional Regulation’s Expected Influence)—helps delineate the spectrum of influence within the phronesis network, from central and pivotal to isolated or less impactful components.

Download:

Fig 2. An overall phronesis network model.

Note. EBIC penalization applied.

https://doi.org/10.1371/journal.pone.0317842.g002

Download:

Table 7. Centrality statistics for the overall phronesis network model.

https://doi.org/10.1371/journal.pone.0317842.t007

Notably, five components emerge as particularly central within this network: Negative Moral Emotion, Moral Deliberation, Aspired Moral Identity, Moral Self Relevance, and Moral Integration. These central components, highlighted by the bold paths in Fig 2 (blue indicates positive relationships, red indicates negative relationships), play pivotal roles in the network, indicating that their influence is more coordinated and foundational to the overall network conceptualization of phronesis. The centrality of these components signifies not only their individual importance but also their collective contribution to the network’s coherence and functionality. This distinction between central and less-central factors is pivotal, as it underscores the varying degrees of influence different components exert within the phronesis network. Understanding these dynamics offers insights into how individual components interconnect to form a relatively comprehensive model of moral and ethical reasoning.

US vs UK.

Next, we explored whether the network structure looked any different when separated into UK vs US samples. We found that the same network structure broadly held independent of the sample, with the same four components being central overall. The Moral Integration variable appeared to be less central to the Phronesis network in the US compared to the UK, however. Network centrality measures are presented in Table 8, and network plots are presented in Fig 3.

Download:

Fig 3. Network plots for the US vs UK phronesis network models.

Note. EBIC Penalization applied.

https://doi.org/10.1371/journal.pone.0317842.g003

Download:

Table 8. Centrality statistics for the US vs UK phronesis network models.

https://doi.org/10.1371/journal.pone.0317842.t008

We then compared US vs UK phronesis networks using the NetworkComparisonTest package [74] in R. A network invariance test, encompassing the pattern of connections and the strength of these connections between nodes, suggested a non-significant difference between US and UK network structures (M = .12, p = .102). We also conducted a global strength invariance test. We found that the overall connectivity of the phronesis network was not different in the UK (3.05) compared to the US (3.85); S = 0.80, p = .094, consistent with covariance matrices observed in Study 1b.

Study 2 Discussion

Before this study, we developed a set of items inspired by the APM, empirically discovered that they organize into ten factors, verified this ten-factor structure in two nationally representative samples, discovered that these factors are relatively stable over time, and subsequently discovered that the interrelationships between these ten factors could not be predicted by the APM. This left us with a theoretical lacuna whereby we did not know which of these ten factors were most important, nor how they dynamically configured. Therefore, this study adopted an alternative psychometric approach aimed at organizing the ten factors using exploratory network analyses. What stood out from the results of this study was that the most central variables of the phronesis network seemed to exclude Emotional Regulation and Virtue Identification, with Situational Moral Relevance and Situational Moral Irrelevance somewhat peripheral to the overall phronesis network.

The phronesis network was broadly similar in the US compared to the UK. For character educators, this might suggest that interrelationships between network nodes could be leveraged to promote more effective moral learning. For example, The Gulliford-Roberts Hypothesis [75] proposes that there is a unity of virtue such that moral education in relation to a given virtue could have positive knock-on effects for closely related virtues (e.g., justice and truthfulness are both put forth as kinds of intelligent caring). Analogously, it is plausible that there will be a knock-on effect whereby character education targeting a particular phronesis component could have a positive effect on related nodes in the network. Moreover, if trying to educate for a particular node in the phronesis network, it could be too complex to influence directly. Aspired Moral Identity, for example, is quite an abstract construct that likely requires a holistic educational or therapeutic approach. In this way, we might look to related nodes in the network such that we can triangulate around the construct of interest through influencing related nodes. Let us take Aspired Moral Identity as an example of a complex and abstract construct in the phronesis network that we might wish to target with an intervention. Aspired Moral Identity is unlikely to be high unless Moral Self Relevance has been established. In turn, Moral Self Relevance might be established through a discussion with a teacher, parent, coach, or clinician focusing on Positive and Negative Moral Emotion where much more concrete discussion of direct experience can be had (e.g., “How would you feel in this situation? What does that say about your values and the kind of person you want to be?”). While we cannot make causal claims at this stage about which factors are downstream of which, we can generate plausible empirically driven hypotheses about how to leverage network dynamics in character education, specifically within practical wisdom coaching and education.

Study 3: Criterion validity

Study 3 Purpose and aims

In this study, we focus on evaluating the criterion validity of the SPM. Criterion validity is essential in psychometric evaluation as it determines whether a measure accurately predicts theoretically relevant outcomes, which is a core reason for developing any psychometric measure. In Study 1, we developed a concise measure of phronesis grounded in the APM, balancing theoretical alignment with practical considerations like brevity and ease of use to ensure participant engagement and scoring efficiency. While no measure can capture every nuance of a complex construct, we believe our item set is sufficiently comprehensive to represent phronesis effectively. To further establish the validity of the measure, we employ several statistical approaches in Study 3. Correlations are used to assess convergent validity (how well the measure aligns with similar constructs) and discriminant validity (how distinct it is from unrelated constructs) across a variety of measures. Regressions are applied to evaluate predictive validity, determining whether the measure can forecast relevant outcomes over a two-month period. Finally, hierarchical regressions are conducted to assess incremental criterion validity, testing whether the measure explains unique variance in outcomes beyond other established predictors.

Incremental criterion validity is especially important because it demonstrates that the measure captures something distinct and meaningful about the individual that is not already accounted for by other variables or measures. In the context of the SPM, this means that the measure adds unique value in terms of explaining variance in relevant outcomes, showing that phronesis is not just an overlap with other constructs we already know about. By explaining additional variance in outcome variables—variables that themselves are important for understanding human flourishing—incremental criterion validity is a clear marker of the unique contribution of the SPM. If the SPM can predict outcomes of interest that other measures cannot, this strengthens the argument that it captures an important and under-explored dimension of a person’s character. Ultimately, these analyses provide a comprehensive evaluation of the measure’s psychometric soundness. Strong evidence for convergent, discriminant, predictive, and incremental criterion validity would affirm the robustness of the SPM, ensuring that it not only aligns with theoretical expectations but also offers unique insights into relevant outcomes.

Now we consider the outcomes sought. The earlier attempt to measure phronesis [19] focused on measures that were drawn primarily from the developmental sciences and it had, as a secondary mission, the aim of assessing whether phronesis or its components might better address the judgment–action gap, which is a central controversy in the field [2]. Thus, the outcome of interest used in the earlier work by the current research center was prosocial action. In the current study, we chose to broaden the focus and assess flourishing as the main outcome of interest [7, 10]. As described in the introduction, we hypothesized that phronesis is associated with an increased likelihood of flourishing, or the sense that life in its more central dimensions (not merely “subjective wellbeing”, but also other aspects of flourishing) is seen in a positive light. We also correlate phronesis with personality and a host of morally salient variables to help situate phronesis empirically amongst other established psychological constructs. Finally, to help to establish that phronesis, as measured here, has practical implications over and above other major psycho-moral theories, we tested whether phronesis predicts flourishing over and above the most recent and comprehensive measure emerging from Moral Foundations Theory [36].