How do children overcome their pragmatic performance problems in the true belief task? The role of advanced pragmatics and higher-order theory of mind

Lydia Paulin Schidelko; Marina Proft; Hannes Rakoczy

doi:10.1371/journal.pone.0266959

Abstract

The true belief (TB) control condition of the classical location-change task asks children to ascribe a veridical belief to an agent to predict her action (analog to the false belief (FB) condition to test Theory of Mind (ToM) abilities). Studies that administered TB tasks to a broad age range of children yielded surprising findings of a U-shaped performance curve in this seemingly trivial task. Children before age four perform competently in the TB condition. Children who begin to solve the FB condition at age four, however, fail the TB condition and only from around age 10, children succeed again. New evidence suggests that the decline in performance around age four reflects pragmatic confusions caused by the triviality of the task rather than real competence deficits in ToM. Based on these results, it can be hypothesized that the recovery of performance at the end of the U-shaped curve reflects underlying developments in children’s growing pragmatic awareness. The aim of the current set of studies, therefore, was to test whether the developmental change at the end of the U-shaped performance curve can be explained by changes in children’s pragmatic understanding and by more general underlying developmental changes in recursive ToM or recursive thinking in general. Results from Study 1 (N = 81, 6–10 years) suggest that children’s recursive ToM, but not their advanced pragmatic understanding or general recursive thinking abilities predict their TB performance. However, this relationship could not be replicated in Study 2 (N = 87, 6–10 years) and Study 3 (N = 64, 6–10 years) in which neither recursive ToM nor advanced pragmatic understanding or recursive thinking explained children’s performance in the TB task. The studies therefore remain inconclusive regarding explanations for the end of the U-shaped performance curve. Future research needs to investigate potential pragmatic and general cognitive foundations of this developmental change more thoroughly.

Citation: Schidelko LP, Proft M, Rakoczy H (2022) How do children overcome their pragmatic performance problems in the true belief task? The role of advanced pragmatics and higher-order theory of mind. PLoS ONE 17(4): e0266959. https://doi.org/10.1371/journal.pone.0266959

Editor: Jérôme Prado, French National Center for Scientific Research (CNRS) & University of Lyon, FRANCE

Received: October 25, 2021; Accepted: March 31, 2022; Published: April 27, 2022

Copyright: © 2022 Schidelko et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The reported project received financial support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) by employing the first author on Research Project 254142454 / GRK 2070. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Theory of Mind (ToM) is the social-cognitive ability to think and reason about one’s own and others’ mental states [1]. At the core of ToM lies meta-representation: the capacity to represent that subjects represent the world in a certain way that can differ from one’s own current perspective. The litmus test for ToM is the so-called false belief (FB) task. In this task, participants need to track a story protagonist’s belief that comes to differ from reality: Participants see how an object is placed at location 1 in the presence of a protagonist and is then moved from location 1 to a new location 2 in the protagonist’s absence. Participants are then asked to ascribe to the protagonist a belief about the object’s location (“Where does she think the object is?”) or to predict the protagonist’s behavior based on her belief (“Where will she look for her object upon her return?) [2]. Individuals with an (explicit) ToM predict that the protagonist will look for her object–based on her false belief–in location 1. Developmentally, children succeed FB tasks around the age of four years. Children younger than four years systematically fail. They predict confidently that the protagonist will look–according to reality–in location 2 [2, 3].

In the history of ToM research, many FB studies also administered an additional True Belief (TB) condition to the children. The TB condition serves as a baseline and control measure to ensure that children, especially younger ones, can cope with the narrative task structure. It is structurally similar to the FB task with the only difference that the protagonist witnesses the object transfer from location 1 to location 2 and thus has a veridical belief about the object’s location. Typically, children younger than four years who fail the FB condition, pass the TB condition [3].

Recently, however, the TB condition has been administered to a broader age range of children, with quite puzzling results: with age, children get worse in the TB condition. More specifically, children from age four who begin to solve the FB task start to fail the TB task [4–7]. The initial pattern found in younger children–passing the TB condition while failing the FB condition–reverses around this age. Children from age four succeed in the FB condition but fail the TB condition. This performance pattern (younger children pass TB and fail FB and vice versa for older children), reveals itself also at an individual level, in strong negative correlations of the two versions of the tasks. Remarkably, the failure in the TB task persists into late childhood: Only from age seven to ten, performance in the TB task recovers. At this point in development, children pass both FB and TB for the first time. Taken as a whole, the development of performance in the TB task follows a U-shaped trajectory: from high performance in young children around age three to a dramatic decrease around age four when children solve the FB condition to a recovery of performance only in later childhood around age seven to ten. FB performance, in sharp contrast, remains constantly high from around age four [6].

As any U-shaped curve in performance, this unexpected developmental pattern raises at least two fundamental questions: First, how does the decrease in performance in the TB task come about at the beginning of the U-shaped curve? Why do children start to fail TB tasks once they come to master FB tasks? Second, how does the recovery of performance come about at the end of the U-shaped curve? Why do children from around age seven to ten overcome their intermediate difficulty with TB tasks?

One possible answer to both questions is the following: The developmental pattern of the U-shaped performance curve in the TB task does not reflect children’s ToM competencies but the development of an understanding of pragmatics. Children’s failure in this intermediate state between four and ten is not based on a fundamental problem in ToM understanding but on pragmatic performance limitations [8].

In general, pragmatics pertains to comprehension and production of speech acts and discourse that goes beyond mere literal meaning. For a comprehensive understanding of most speech acts and discourse, additional information besides the literal meanings of the words (sentence meaning) needs to be taken into account in order to determine what is meant (speaker meaning): for example, information about who made an utterance, in which context, against the background of which rules etc.–and in particular, the recipient needs to figure out the speaker’s intentions underlying the speech act in question [e.g., 9]. Pragmatics is thus, in some sense, a form of applied ToM.

Regarding the TB performance, it seems quite clear that children from early on do understand the semantics, the literal meaning of the words in the TB test question. Perhaps, however, children in the intermediate state (between ages four and ten) struggle with understanding the use of the test question “Where does the protagonist think the object is?” They do not yet understand what the interlocutor means or wants by asking this question. But why should the TB question be pragmatically challenging? Why do children struggle to grasp the experimenter’s intention in asking the TB test question?

A closer inspection of the TB task and the corresponding question reveals that it combines a number of properties that jointly make the task quite peculiar from a pragmatic point of view. First, the TB task is highly trivial: The protagonists clearly sees that the object is moved to a new location and the protagonist, the child and the experimenter share this knowledge about where the object is and everyone knows that the others know, too (this is thus common ground or mutual knowledge). Second, the TB test question is an academic question. The experimenter knows the answer herself. She does not ask this question to gain new information but rather to test whether the child knows the answer, too. Academic question formats are difficult to grasp for young children [10] (for effects of the interviewer’s knowledge on children’s answer behavior, see also [11–13], for related proposals regarding the role of pragmatic factors in FB and other ToM tasks, see, [14–17]. Third, the TB test question asks for a belief ascription or a belief-based action prediction [6]. Normally, we tend to talk about beliefs when we refer to or at least raise the possibility of their falsity [18]. In the TB scenario, however, there is no such obvious possibility. From a purely semantic point of view, the TB test question is utterly unproblematic, indeed highly trivial. Children with a merely literal language use with little sense of pragmatics should thus not find such questions taxing. But for language users with some pragmatic sensitivity, such questions should appear at least prima facie odd.

In light of the first question raised above (How does the decrease in performance in the TB task come about at the beginning of the U-shaped curve?), the pragmatic analysis thus yields the following (somewhat simplified) picture: Young children without a sophisticated understanding for ToM are limited in their pragmatic language understanding. They mostly use and interpret language literally (but see [19]). Children in this stage of development thus should have no problems with the TB task. However, once children start to develop ToM capacities (i.e., when they pass the FB task), these lay the ground for developing pragmatic understanding [20, 21]. However, their initial ToM and pragmatic understanding are still fragile at this age and their fragile pragmatics then leads them astray in the TB test.

Some evidence speaks in favor of this. First, as reviewed above, performances in the TB and FB tasks are negatively correlated such that children first fail FB and pass TB, and then show the reverse pattern [6]. The performance pattern matches the predictions of the pragmatic analysis such that children’s failure in the TB task depends on their success in the FB task: once children develop the prerequisite ToM capacities, they develop an understanding for pragmatics. As a consequence of this development, they suffer from the pragmatic peculiarity of the TB question and fail to answer it correctly while passing the FB task [6]. Second, children succeed in both the TB and FB task after modifications of the task pragmatics in the TB task. For example, children solve the TB task without any decline in the performance curve when the task is presented without or with less trivial language [8]. These results suggest that children show no more difficulties with answering the TB test question correctly once peculiar factors of the tasks are removed. Taken together, first evidence confirms the predictions of the pragmatic analysis at the beginning of the U-shaped performance curve.

The developmental processes at the end of the U-shaped performance curve, however, still raise open questions. How do children overcome their pragmatic difficulties in the TB task later in development, and how does this explain performance recovery at the end of the U-shaped curve? Currently, hardly anything is known about how children overcome their performance limitation in the TB task. From a pragmatic point of view, one possibility is the following: Regarding the beginning of the U-shaped development, the pragmatic analysis predicts that children start to get confused by the peculiarity of the TB test question once they develop sensitivity for pragmatics on the basis of their developing ToM. Applying the same logic to the end of the U-shaped curve, the pragmatic analysis predicts that children succeed again in the TB task once they have undergone further pragmatic development (adults, after all, even though they may find this type of questions funny, have no difficulty in answering it correctly). Taking new steps in pragmatics development might enable children to reason about language use on a higher, more sophisticated level of pragmatic interpretation, and thus to grasp more complex and advanced forms of discourse and speech acts.

Imagine, to illustrate the point, someone asks you, “Did you enjoy the nice weather yesterday?” when it in fact was raining cats and dogs all day. Interpreting this as a regular question asked in order to receive new information would be highly confusing given that the presupposition (“The weather was nice yesterday”) is not fulfilled. To understand the actual speaker meaning, you need to stand back from the literal meaning of the question and interpret it at a different, higher level. For example, you might infer the speaker’s intention to make an ironic comment about the rainy weather, or you may interpret the question as an academic exercise such that the speaker’s intention is to test your knowledge of English past tense. What these examples thus illustrate is that a prima facie odd question can make perfect sense when interpreted on a new and higher pragmatic level [22].

Applied to the TB performance, this developmental step might enable children to focus on a new interpretational level that allows them to resolve their initial confusions about the peculiar test question in the TB task (“strange question, but then I’m participating in a study after all; researchers do ask strange questions…”). If this idea holds, children who are able to reason at a flexible, higher-order level of pragmatic interpretation should be more likely to answer the TB test question correctly. Accordingly, the emergence of flexible, higher-order pragmatic understanding would determine the end of the U-shaped performance curve.

But what exactly happens in children’s pragmatic development at the end of the U-shaped curve? How can the crucial pragmatic development at the end of the U-shaped curve be described, and what are important foundations and correlates of this development?

The pragmatic analysis predicts a developmental progress in advanced pragmatics at the end of the U-shaped curve. If indeed the developmental curve reflects pragmatic progress more generally, this should also become apparent in other areas of advanced pragmatics. To this end, we will compare children’s performance in the TB task at the end of the U-shaped curve with developments in advanced pragmatic understanding more generally (regarding comprehension of indirect speech acts). The progress in advanced pragmatics, in turn, could itself be rooted in growing recursive ToM capacities. And the development in higher-order recursive ToM, in turn, might be based on a more general ability for recursive thinking. In the following, we discuss these three possibilities in turn.

Advanced pragmatic understanding

The crucial step in pragmatic development that leads to the end of the U-shaped curve may reflect a more general phenomenon of developmental progress in pragmatics. Such a general progress in advanced pragmatics might become evident when comparing the performance in the TB task and advanced pragmatics in other areas. Generally, pragmatics is neither a simple and unitary phenomenon [23] nor do pragmatic abilities emerge in simple uniform ways across ontogenetic development [24]. Most relevant for present purposes are advanced forms of pragmatic understanding that tend to emerge comparatively late in development. Non-literal language understanding, such as ironic and metaphorical utterances, are prototypical examples [21, 25, 26]. For example, in uttering "It’s great weather for our picnic today" when it is raining all day, the speaker does not want the hearer to take her utterance literally. The speaker rather intends the hearer to belief that she thinks that it is not nice to have a picnic outside [27, p.262]. Accordingly, the hearer needs to suppress the initial, literal interpretation to be able to infer the actual meaning, for example, by taking context information into account [24]. Similar processes may be at work in the TB task such that children might overcome their confusion about the trivial test question (“This is too easy, maybe I missed something”) by suppressing this initial interpretation and taking context information into account (“I’m participating in a study and the experimenter asks test questions”). Interpreting non-literal language, especially ironic utterances, involves ascriptions of complex intentions and is therefore suggested to be an application of ToM [20, 26]. In neurotypical development, children develop an understanding of metaphors during school age or even preschool age [19, 24, 28]. Its relation to ToM abilities, however, has been controversially discussed in the recent literature [28–30]. Evidence on irony comprehension suggests that children develop an understanding of ironic utterances during school age between six and 13 years of age, depending on the kinds of measures used [24]. The relation of irony understanding to ToM is less controversial: children’s performance in irony comprehension correlates with their second-order FB understanding [26, 31].

Recursive ToM

The crucial foundation of development regarding the end of the U-shaped curve might be even more general than sophisticated pragmatics–for example, the capacity for recursive, higher-order mindreading. The standard ToM tests asks for a first-order mental state ascription, but of course this is not where mental state ascription ends. Advanced forms of ToM enable flexible and higher-order forms of mindreading in which additional levels of mental states can be represented. Ascription of a belief about a belief, for example, constitutes second-order mental state ascription. Virtually, mental states can be recursively embedded within each other infinitely (“A thinks that B thinks that C thinks that D thinks that … p”). This understanding for recursively embedded mental states may be the common denominator underlying both advanced pragmatics generally and of the TB performance specifically. It may provide the basis for advanced pragmatics in various forms, for example, by enabling the ascription of higher-order communicative intentions. In the TB task, recursive ToM might enable the ascription of specific higher-order communicative intentions to make sense of the speech act. There are at least two possibilities, how recursive ToM might be involved in the TB task in particular. One possibility is that children who develop an understanding for the pragmatic use of academic test questions (which are also very trivial and pertain to subjective representations) do not any longer get confused by the TB test question at all, but reach the right pragmatic interpretation of the question straight away. They integrate context information and ascribe higher-order intentions and thus make sense of the academic test question from the outset without being confused. A second possibility is that children initially suffer from confusion by the trivial academic TB test question (e.g., “Why does the experimenter ask me such a question? Maybe I must have missed something, maybe the experimenter thinks that I don’t know that the protagonists does not know that the object is in that location”). Younger children might not yet be able to resolve this confusion whereas older children, once they have developed recursive mindreading, might be able to ascribe higher-order mental states and thus to resolve the confusion (e.g., “She wants to know whether I know that the protagonist knows that the object is in the new location”). Either way, children’s performance on the TB test question should then be related to their recursive ToM capacities more generally.

Recursive ToM reveals itself in various forms, in many different tasks and situations, but tasks that directly test for this understanding for second- or higher-order mental state ascriptions are rare in the literature. Evidence from this line of research shows that children at the age of five to six years can attribute second-order mental states [“A thinks that B believes that …”; e.g., 32], but only very little is known about children’s development of higher-order ToM beyond second order of recursion. In a study by Liddle and Nettle, for example, ten- to eleven-year-olds performed above chance level in a third-order ToM task and at chance level in a fourth-order ToM task [33]. Adults, in contrast, were able to reason until seventh-order of recursion, in particular, when tested “implicitly” through observing video clips of social interactions compared to explicit measures used in the above reported studies with children [34]. Until now, it therefore remains an open question when exactly children learn to reason about mental states on different levels of embedding and whether this development is fundamental for children’s performance in the TB task.

General ability for recursive thinking

The ability to reason about higher-order mental states, in turn, might be based on a more general ability for recursive thinking. The performance in the TB task, hence, would not only rely on the ability for higher-order ToM but on an even broader ability for recursive operations that is fundamental for higher-order ToM abilities.

Thinking and reasoning recursively is not only of importance in the development of ToM but has been implicated in a number of probably uniquely human abilities such as language, music, mathematics or mental time travel [35–37]. Recursive operations in all these areas require embedding of elements (e.g., mental states, words/clauses, etc.) within elements of the same kind [38]. The corresponding level of reasoning can be more or less clearly defined and quantified (e.g., in the domain of ToM: “A thinks that B thinks that C thinks that p” as third-order mental state ascription).

The general ability for recursive operations might manifest itself in recursive thinking in the various specific areas of application, including higher-order ToM. Consequently, the developmental changes in TB performance might reflect development in advanced pragmatic that builds on recursive ToM that, in turn, is a manifestation of general recursive operations. Once the child has acquired a certain level of general recursive thinking, this enables her–via recursive ToM–to think pragmatically about the TB task at a higher level, overcome her pragmatic confusion and solve the task. If this hypothesized pattern holds, an individual, first, would be able to think recursively on the same level of embedding in different areas of application; and second, the general level of recursive thinking would, at least partly, predict her performance in the TB task.

Rationale of the present study

In sum, the puzzling developmental pattern of the U-shaped performance curve in the TB task raises two fundamental questions: First, how does the decrease in performance in the TB task come about at the beginning of the U-shaped curve? Second, how does the recovery of performance come about at the end of the U-shaped curve? The pragmatic analysis presents one possible answer to both questions: the U-shaped curve reflects an underlying development in children’s understanding of pragmatics.

Previous research has yielded some evidence that speaks to the first question, but so far, no study has empirically addressed the second question. The rationale of the present study, therefore, is to test whether, indeed, the developmental change at the end of the U-shaped performance curve reflects and can be explained by pragmatic development.

To this end, we tested a wide age range of children with the standard FB and TB task. Due to the restrictions of the Covid-19 pandemic, testing was conducted in an online format. We ran the studies as moderated online studies in which the experimenter interacted with the child via video chat while presenting the tasks on screen. With this change in setting, the pragmatic context in which the experimenter administers the TB task to the child was essentially different compared to earlier studies. A preliminary question thus was whether the typical performance curve replicates in the new format in children between six and ten years. The age range is expected to include younger children who still fail and older children who succeed again in the TB task (e.g., [6]), representing the right half of the U-shaped performance curve.

The main research question then was: What are the factors that explain the end of the U-shaped performance curve in the TB task? Here, we explore different possibilities based on the pragmatic analysis introduced above:

Is the TB pattern a function of advanced pragmatics development?
Even more generally: Is it a function of recursive ToM, or even of recursive thinking in general?

In order to do so, Advanced Pragmatic Understanding was operationalized in a task that asked for children’s metaphor and irony understanding. Recursive ToM was operationalized in tasks that tested children’s understanding and production of higher-order mental state ascriptions. The general ability for recursive thinking was operationalized as children’s recursive language abilities as a proxy for their general recursive abilities. This task tested children’s understanding for embedded recursive clauses. Children were tested in these tasks and the TB task in three online studies. Table 1 displays the tasks administered in Study 1–3.

Download:

Table 1. Tasks included in Studies 1–3 in the order presented in the test session.

https://doi.org/10.1371/journal.pone.0266959.t001

Study 1

Method

This research was conducted in accordance with the Declaration of Helsinki and the Ethical Principles of the German Psychological Society (DGPs), the Association of German Professional Psychologists (BDP), and the American Psychological Association (APA). It involved no invasive or otherwise ethically problematic techniques and no deception (and therefore, according to National jurisdiction, did not require a separate vote by a local Institutional Review Board; see the regulations on freedom of research in the German Constitution (§ 5 (3)), and the German University Law (§ 22)). Before the test sessions of Studies 1–3 started, informed consent was obtained from the parents of the subjects.

Design.

Children in all three studies were tested in a single session (30–45 minutes) by a female experimenter (E). The tasks were presented remotely (on a laptop computer screen or tablet computer screen, no smartphone) in an interactive online study via a video conferencing platform (mainly BigBlueButton, in case of connection issues, the test session was shifted to Zoom). The tasks were embedded in a video, which was displayed via the conference platform in the middle of the child’s screen and required the child to give verbal answers. Next to the video, the child was constantly able to see the webcam video of E and herself, so that the child and E were able to communicate via audio and video streaming during the whole test session (see [7] for a validation study of this paradigm). Before the beginning of the test session, the caretaker gave verbal consent to the child’s participation in the study and the video and audio recording during the test session. The verbal consent was recorded and stored separately from the recording of the test session. The caretaker and the child were informed that they might abort the participation at any given moment. In the beginning of the test session, E advised the child that she could repeat each question if the child had any comprehension difficulties.

Participants.

Eighty-one 6- to 10-year-old children (72–131 months, mean age = 99.52 months; 41 girls, 40 boys) were included in the final sample. Eight additional children were tested but excluded from data analyses because of technical issues during the test session (N = 6), uncooperative behavior (N = 1) and concentration deficit resulting in >50% incorrectly answered control question in the change-of-location task (N = 1). The age range was chosen so broadly in order to compare children who show and do not show the performance difficulties in the TB task. Participants in this and all subsequent studies were recruited from a database of children whose parents had previously given consent to experimental participation as well as via social media.

Material.

Test for syntactic recursion. The task adapted from Arslan and colleagues (2017) tested for the comprehension of embedded relative clauses in German language [39]. Children saw two rows of animals (upper and lower row) on the screen (Fig 1). Each animal was displayed on a different background color. The children were asked to name the location of a corresponding animal on the screen (e.g., “Where is the cow that strokes a horse?” for the first-order syntactic recursion test question). Children had to refer to the animal’s location in naming the corresponding background color and the row in which the animal was placed (e.g., “yellow, upper row”). The test questions containing the relative clauses were scaled from first order until fourth order of syntactic recursion and could be repeated up to four times [39]. For a detailed procedure, see S1 File.

Download:

Fig 1. Material for test question with a first-order syntactic recursion “Where is the cow that strokes a horse?”, Correct answer “yellow, upper row”.

https://doi.org/10.1371/journal.pone.0266959.g001

Standard change-of-location task. The children received four trials of the standard change-of-location tasks with different stimuli [2] implemented in short, animated video clips. Protagonist A and her object O were presented to the child before Protagonist A placed O in one of two boxes (box 1). In her presence (TB condition) or absence (FB condition), protagonist B came into the scene and moved O to the other box (box 2) and the following test and control questions were asked:

Test question: “Where does Protagonist A think that O is?” [correct answer box 1 for FB, box 2 for TB condition]
Control Question 1: “In which box was O in the beginning?” [correct answer: box 1]
Control Question 2: “Where is O now?” [correct answer: box 2]

The TB and FB trials were presented in alternating order beginning with a TB trial and the children saw a frozen still image of the last frame of the scenario (Protagonist A and the two boxes) when answering the questions.

Recursive ToM task: Understanding. The children heard three stories (partly adapted from [33]) accompanied by animated video clips and were asked to answer test questions about the characters’ mental states afterwards. The test questions were scaled from second order to fifth order of mental state recursion and children had to decide which of two sentences was true regarding the story line. The sentences were read out by a voice and displayed with pictures on the screen (Fig 2). To choose one of the two sentences, children could either name the side/color of the picture on the screen or repeat the sentence.

Download:

Fig 2. Screenshots of the visual animation of the fourth-order Theory of Mind test question in the story line “the video dilemma”.

Note. Children heard a voice slowly reading out the answer sentences while the animation was presented accordingly on the screen. E.g., “Sarah hopes [picture a] that Olli believes [picture b] that she knows [picture c] that Mrs. Brown wants [picture d] them to watch a pirate film [picture e]”. After that the second answer option (red side) was read out and presented accordingly.

https://doi.org/10.1371/journal.pone.0266959.g002

Example story: the video dilemma (adapted from [33])

This is Sarah and this is Olli. Sarah and Olli are in the same class at school. “Hi, I’m Sarah!’, “and I’m Olli”. Their teacher is Mrs. Brown. Today Mrs. Brown suggests that Sarah and Olli should bring a video to school tomorrow to watch with the other children. Mrs. Brown also says to them, “Make sure you bring a film that I will like too!” (Mrs. Brown leaves the scene). Sarah’s favorite videos are pirate videos. Olli’s favorite videos are horse films. Which will it be? A pirates or a horse film? Olli says to Sarah, “We just can’t decide so I think that we should take the film that Mrs. Brown would like. Sarah, do you know which one Mrs. Brown would like best?” Sarah is thinking about that. She does not have a clue which film Mrs. Brown would like. But Sarah decides to tell Olli that she knows that Mrs. Brown likes pirate films best. Sarah thinks that this will make Olli agree to take a pirate video to school. Olli listens to this and then Olli says, “We will take a video of pirates then.” So, Sarah gets to enjoy her favorite film!

Memory question

Which sentence is true?

a) Sarah likes pirates films best.

b) Sarah likes horse films best.

Test question (ToM Level 4)

Which sentence is true?

a) Sarah hopes that Olli believes that she knows that Mrs. Brown wants that they watch a pirate film.*

b) Sarah hopes that Olli believes that she doesn’t know that Mrs. Brown wants them to watch a pirate film.

*German translation with that-complement for want (“möchte, dass”)

Task for advanced pragmatics understanding. Children received two trials of a pragmatic language task testing for their metaphor and irony understanding [partly adapted from and inspired by 20, 26, 30, 31]. Each trial consisted of a story about two characters accompanied by three pictures (Table 2 and example below).

Download:

Table 2. Example story for Advanced Pragmatics Understanding Task.

https://doi.org/10.1371/journal.pone.0266959.t002

The questions (and answer options) could be repeated up to four times. During the ironic utterance (Table 2, Picture 3), the speaker’s face was not visible to avoid any inferences from their facial expression. To answer the second test question correctly, children had to refer to the speaker’s mental state or attitude to the other agent’s behavior or refer to the negative outcome of the other agent’s behavior or to the opposite/ ironic meaning of the utterance. Answers to this test questions were coded with a fixed coding scheme (adapted from [25, 26], see S1 File).

Results

Coding of predictors.

In the syntactic recursion task and the recursive ToM understanding task, children were coded with the highest level of recursion until that they performed consistently correct (e.g., child is coded with “3” when she answers the test questions for level 1–3 correctly, but the test question for level 4 incorrectly). For advanced pragmatics, children received the score of correct trials for the metaphor test questions, the irony test question 1 and irony test question 2 separately (0–2 each). For the coding scheme for irony test question 2 and interrater reliabilities, see S1 File.

Plan of analysis.

In a first step, we assured that the children responded consistently in the two trials of the same condition in the change-of-location task, so that we were able to code children’s performance in this task for the subsequent analysis in a binary format (passers vs. non-passers).

Second, in scope of the preliminary analyses, we tested for the typical performance in the TB and FB task in computing comparisons against chance level performance for both TB and FB in the three age groups (young, middle, old). Children of all groups were expected to perform above chance level in the FB task whereas only the oldest age group (9;4–10;11 years) was expected to perform better than chance level performance in the TB task. Additionally, we computed correlations between FB and TB performance which were expected to be negative for the two younger age groups and positive for the oldest age group only.

To address the main research question of factors that influence the performance in the TB task, we computed a logistic regression model. In the logistic regression model, TB performance (passing vs. no-passing) was predicted by recursive syntactic abilities, recursive ToM understanding, advanced pragmatics understanding and children’s age. We compared this full model with a control model containing only children’s age in months.

TB and FB performance: Consistency across trials in the standard change-of-location task.

The consistencies in performance of children over the two trials of the same condition of the standard change-of-location task were high. The percentage of children who had two available trials (meaning all control questions answered correctly) and showed the same performance in both trials was 85.90% (Φ = .68) for the TB trials and 98.75% (Φ = n/c, due to at least one constant variable) for the FB trials. Therefore, both trials were included in the analysis. For the following analysis, the TB and FB performance were coded as binary variables. Children had to pass both trials of a condition to be assigned to the group of passers. Children failing in one or both trials of a condition were assigned to the group of non-passers.

Preliminary analyses: TB and FB performances in different age groups of children.

The performance in the change-of-location task as a function of belief type and age is depicted in Fig 3.

Download:

Fig 3. Children’s performances in the standard change-of-location task as a function of age group and belief type in Study 1.

https://doi.org/10.1371/journal.pone.0266959.g003

To test for the failure in the TB condition and the success in the FB condition of younger children and the success in both conditions in older children, we computed Wilcoxon signed rank tests against chance level performance (0.5) for the three age groups and the two belief conditions. The Wilcoxon tests showed that the youngest age group (6;0–7;7-year-olds) performed significantly above chance in the FB condition (M = .93, p < .0001, r = -.85). The tests could not be computed for the two older age groups due to ceiling effects in the FB condition (7;8–9;3-year-olds and 9;4–10;11-year-olds M = 1). In contrast, Wilcoxon signed rank tests revealed that in the TB condition, only the oldest age group of children (9;4–10;11-year-olds) performed significantly above chance (M = .74, p < .02, r = —.48). Younger children (6;0–7;7-year-olds and 7;8–9;3-year-olds) performed at chance level (6;0–7;7-year-olds: M = .41, p = .34, r = —.18; 7;8–9;3-year-olds: M = .63, p = .18, r = -.26).

The correlation between the TB and FB performance in the change-of-location task is (not-significantly) negative for the whole sample (r(phi) = -.13, p = .24) as well as for the youngest age group (6;0–7;7-year-olds: r(phi) = -.34, p = .08). Because of the ceiling effects in the FB condition, the correlation is not computable for the two older age groups.

Main analyses: Predictors for TB performance.

We removed children failing the first-order FB condition (N = 2) from the following analyses. This was based on the assumption that children who still do not succeed in the first-order FB task use different cognitive strategies to solve the TB task compared to the group of children we aim to examine here.

Descriptive statistics. The mean performances in the recursive ToM understanding task, the advanced pragmatic language task and the syntactic recursion task as a function of TB performance and age are summarized in Table 3.

Download:

Table 3. Mean performance (M) and standard deviations (SD) in Syntactic Recursion (Synt. Recurs.), Recursive ToM Understanding (RToM U), metaphor understanding (Metaphor), irony understanding in the first and second test question (Irony1 and Irony2) and for TB non-passers (noTB) and TB passers (TB) in three groups of age.

https://doi.org/10.1371/journal.pone.0266959.t003

For a more detailed summary of answers to irony test question 2, see S1 File.

Logistic regression models. We estimated the effect of the different predictors of mental state ascription on the TB performance using a multiple logistic regression model. To control for children’s age in months, we included it into the model, too. Prior to fitting the model, we checked for the assumptions. We checked for multicollinearity (all VIFs ≤ 1.38) and linearity of the logit for age (b = 0.08, p = .81), recursive ToM understanding (b = 1.48, p = .73) and syntactic recursion (b = -2.98, p = .27).

We compared the fit of the full model with that of a null model with the control variable only (TB ~ age). As the model comparison is significant, the predictors of mental state ascription have an impact on the TB performance (Model X²(6) = 24.58, p < .001). More specifically, an increased ability in recursive ToM understanding lead to increased TB performance (B = 0.84, p < .001***, OR = 2.33). Pragmatic language abilities, age and syntactic recursion abilities did not affect the TB performance significantly (Table 4).

Download:

Table 4. Results of the logistic regression model predicting children’s TB performance with their age in months and their performance in tasks of syntactic recursion (Synt. Recurs.), Recursive ToM understanding (RToM U), and Advanced Pragmatics (Metaphor and Irony1 and Irony2 for irony test questions 1 and 2).

https://doi.org/10.1371/journal.pone.0266959.t004

Fig 4 pictures this difference in performance in recursive ToM understanding between TB-passers and TB-non-passers.

Download:

Fig 4. Children’s performance in recursive ToM understanding as a function of their TB performance (passers vs. non-passers) across age.

https://doi.org/10.1371/journal.pone.0266959.g004

Post-hoc analyses. We computed post-hoc one-sided Wilcoxon rank sum tests for TB-passers versus TB-non-passers in recursive ToM understanding as it significantly predicted the outcome in the logistic regression model. Due to multiple testing, Bonferroni correction was applied and resulted in an alpha value of 0.0125 (0.05/4) for this post-hoc computation. The comparison of the performance shows a significant difference in the recursive ToM understanding for TB-non-passers (M = 1.91, n = 33) against TB-passers (M = 3.72, n = 46, W = 319.5, p < .0001, r = -.53) for the whole sample. For the comparison within the age groups, Wilcoxon rank sum tests reveal that the performance differs significantly for the middle age group between TB-non-passers (M = 1.80, n = 10) and TB-passers (M = 4.18, n = 17, W = 16, p < .0001, r = -.73). This difference is not significant for the youngest age group (M(noTB) = 1.75, n = 16; M(TB) = 2.44, n = 9, W = 50.5, p = .09, r = -.34) and the oldest age group of children (M(noTB) = 2.43, n = 7; M(TB) = 3.90, n = 20, W = 34.5, p = .02, r = -.45).

The achieved power was computed post-hoc for the logistic regression model. For the significant predictor recursive ToM understanding, this resulted in a power of 1 –ß = .89.

Discussion

The expected pattern of typical TB and FB performance in children between six and ten years was replicated in the online study: children in the youngest and middle age group (6;0–9;3 years) failed to perform above chance level in the TB task while FB performance was at ceiling. Only children in the oldest age group (9;4–10;11years) performed proficiently in both conditions. Additionally, the study shows first evidence that children’s TB performance can be (partly) explained by their understanding for recursive ToM. However, none of the variables of Advanced Pragmatics understanding or syntactic recursion were significant predictors for children’s TB performance.

Study 2

Study 2, therefore, aimed to replicate this relation between TB performance and children’s recursive ToM. In order to explore the underlying recursive ToM abilities in more detail, Study 2 operationalized children’s recursive ToM twofold: similar to Study 1, children’s understanding for recursive mental state ascriptions was measured. Additionally, children’s recursive ToM production was measured to identify the cognitive mechanisms relevant for the TB task more fine-grained.