How the Context Matters. Literal and Figurative Meaning in the Embodied Language Paradigm

The involvement of the sensorimotor system in language understanding has been widely demonstrated. However, the role of context in these studies has only recently started to be addressed. Though words are bearers of a semantic potential, meaning is the product of a pragmatic process. It needs to be situated in a context to be disambiguated. The aim of this study was to test the hypothesis that embodied simulation occurring during linguistic processing is contextually modulated to the extent that the same sentence, depending on the context of utterance, leads to the activation of different effector-specific brain motor areas. In order to test this hypothesis, we asked subjects to give a motor response with the hand or the foot to the presentation of ambiguous idioms containing action-related words when these are preceded by context sentences. The results directly support our hypothesis only in relation to the comprehension of hand-related action sentences.


Introduction
In the last years many empirical findings have shown the involvement of the sensorimotor system in language understanding [1][2][3]. Listening to a sentence such as ''John grasps the glass'' determines the activation of hand-related areas of the motor cortex even if we are not carrying out any hand-related action. These findings suggest that linguistic meaning is grounded in systems for action and perception and directly challenge the idea that human cognitive abilities can be described in terms of operations on amodal and abstract symbols proposed by classic cognitivists [4,5].
In this regard, behavioural studies revealed a facilitation effect between the processing of action-related words or sentences and the performance of concurrent and compatible motor acts [6][7][8][9]. An interference effect has also been observed in another set of studies (e.g. [8,[10][11][12][13]). The occurrence of either facilitation or interference is thought to depend on the extent of the temporal overlapping between linguistic and motor tasks [10,12,14]. Furthermore, neuroimaging [15][16][17][18], and neurophysiological studies, the latter realized by means of Transcranic Magnetic Stimulation (TMS) [8,13,19,20], have widely revealed activation of the motor cortex when participants of the studies read or listened to action-related language. These studies showed that motor activations elicited by processing of action words and sentences follow a somatotopic organization.
Interestingly, somatotopic motor activation has been observed also during the comprehension of abstract and figurative use of language such as metaphors and idioms [21][22][23][24][25]. However, these latter findings are still controversial and challenging results have been reported in other set of studies [26,27].
Recently, several review articles have been devoted to the task of critically analyzing the ever-increasing amount of behavioural, neuroimaging and neurophysiological data on this topic [28][29][30][31]. Among others, the most debated issues are currently those regarding the constitutive and automatic involvement of the sensorimotor system in language understanding. In other words, the hypothesis that the activation of the sensorimotor system is always a necessary condition to comprehend language, at least when language refers to the domains of action and perception, and whether this activation is sensible to factors such as contextual cues, is under investigation. This work will focus precisely on this latter topic, namely the role of context in modulating motor simulation.
It is worth noting here that authors working in embodied language research adopt an operative definition of semantics, described as a set of fixed relations between linguistic signs and some forms of knowledge. Language is conceived as a rigid code where words, meanings and-in the light of neuroscientific evidencepatterns of neural activations are fixedly associated. There are two hypotheses on which such an idea of semantics rests: 1) The compositional nature of meaning allowing to derive the meaning of a sentence from that of its word elements [32,33]; 2) The fact that a linguistic expression could have meaning in the absence of a context. This really seems very doubtful, though points to an important intuition: words, though they can be modulated, are bearers of a semantic potential, which however in the absence of a context remains extremely vague.
The notion of context deserves an explanation. There is a propositional context, i.e. the expression in which the term occurs [34]. Then there is a broader context, the co-textual one. This means that the meaning of an expression is connected to the co-text, i.e. to the expressions that precede and follow it in a text [35]. There is a further meaning of context which is also essential, the pragmatic background, i.e. what the speaker does using a certain expression, the nonverbal context in which the speech act takes place [36], or the linguistic game, to use a Wittgensteinian term [37].
The pragmatic context is more than a mere element of enrichment of the linguistic expression; in fact, it is a fundamental feature of its possibility to signify something. In this view, meaning is always conceived of as the product of a pragmatic process and needs to be situated in a context to be correctly disambiguated and understood.
While the role of context in modulating embodied simulation during action recognition has already been demonstrated and widely discussed [38][39][40][41], only recently has this issue been addressed in embodied language research [25,[42][43][44][45]. What seems to still be missing is a redefinition of the ideas of meaning and semantics based on the notion of usage. Meaning, in this view, is a dynamic process in which both speaker and hearer are actively involved. Until now, when contextual effects have been taken into account, they have been conceived of as something given outside of speakers and that interacts with prefixed meanings in their ''heads''. In the light of the definition of meaning as usage, we can instead think of meaning as the product of a process that entails different mechanisms such as, motor simulation and the integration of co-textual and contextual information. Significantly, this definition of meaning allows us to hypothesize that the mechanism of embodied simulation [46,47] is part of the process of language comprehension in spite of being an automatic mechanism and is sensible to the context, considering context in its broadest sense.

Aim of the study
The aim of the study is to test the hypothesis that embodied simulation occurring during linguistic processing can be contextually modulated to the extent that the very same sentence, depending on the context of utterance, leads to the activation of different effector-specific areas of the brain. Two previous studies have shown that the same target sentence, depending on the context of utterance, can activate in different ways the motor system [45,48]. In a fMRI study, van Ackeren and colleagues [45] looked at motor simulation during the comprehension of indirect speech acts. Authors of this study showed that the comprehension of indirect requests, for example the sentence ''It is hot here'' uttered in a room with a window and interpreted as a request to open the window, activates the motor system much more reliably than the comprehension of the same sentence uttered in a different context, e.g. in the desert, and interpreted as a statement. Egorova, Shtyrov and Pulvermüller [48], in a study carried out with the time-resolved event-related potential (ERP) technique, also looked at different types of speech acts realized by means of the same target sentence. In this study, a critical word, considered as the target-sentence, was preceded by two different context sentences. According to the context of utterance, the same word was used to name or to request an object. Egorova and colleagues found that request-evoked potentials were larger in amplitude than those for naming. Significantly, the fronto-central cortex was the source underlying the ERP enhancement for Request suggesting the activation of motor knowledge. As far as we know, no study on figurative language has ever evaluated the role of pragmatic context on motor simulation in a similar way. This is precisely what we are going to do in this study. Previous works have mainly looked at motor simulation elicited by an actionrelated verb when this verb appears in two different, although similar, sentences, one literal and one figurative (for example, ''Pablo kicks the ball'' vs ''Pablo kicks the habit''). In our study, we are going to look at motor simulation elicited by the very same target sentence when preceded by two different contexts (for example, the sentence ''Pablo cuts the rope'' preceded by two different sentences). We hypothesize that, according to the context of utterance, the very same sentence can recruit in different ways the motor system. In order to test this hypothesis, we assessed embodied simulation occurring during the comprehension of idiomatic and literal sentences containing action-related words when these sentences are preceded by a brief context sentence.
Idioms are multi-word units whose meaning does not correspond to the sum of their component parts [49]. They are considered conventional by definition. In fact, as Numberg, Sag and Wason [50] suggested, apart from the property of conventionality, none of the property usually ascribed to idioms (e.g. inflexibility, figuration, proverbiality, etc…) applies obligatorily to all idioms. In our study, we selected a list of Italian ambiguous idioms containing an action-related word. An idiom is defined as ambiguous if it has a plausible meaning both in the literal and in the figurative usage (ex. To kick the bucket). For each idiom (target sentence) we worked out two brief context sentences that can trigger respectively the literal or the figurative interpretation of the target sentence. Stimuli were constructed in such a way that when idiomatic meanings were elicited, verbs considered as arm/ hand-action related described leg/foot actions and vice versa.
Following Borregine and Kaschak [7], a facilitation effect is expected only under conditions in which the same features are simultaneously active for different tasks (e.g., linguistic and motor task). However, if one of the two different tasks is completed (for example, if a full simulation of the action-related word has run), then the common features will be temporarily bound to that task (linguistic processing in this example) becoming unavailable to the other task (the motor act). In the light of this, in our study a facilitation effect is expected during the processing of literal sentences when participants respond with the congruent effector. In this case, they likely have enough (but not all) information to perform the motor task at an earlier stage of linguistic processing, while they have to wait until the end of the sentence in the case of idiomatic meanings. It is worth noting that idioms are considered as semantically transparent when there is a direct connection between the phrase and its figurative meaning, that is, when the idiomatic meaning can be guessed from the meaning of the words forming the phrase; when no such direct connection is available and the relationship between the phrase and the figurative meaning is arbitrary, idioms are considered as opaque. Considering that transparent idioms are supposed to be more easily imageable and that their imageability can affect embodied representations [51][52][53][54], we decided to select only transparent idioms. In this way, the role of context in modulating motor simulation is even more evident. In fact, although transparent idioms, due to the process of mental imagery, could trigger a motor simulation related to their literal meaning, we aimed to show that contextuallybased interpretation prevails over literal meaning interpretation.
Furthermore, a questionnaire was also administered to exclude that our results could be explained by the degree of imageability of each item. Results from the questionnaire ruled out this possibility.

Validation of the stimuli
We selected 20 Italian ambiguous idioms. Each idiom contained a verb or a noun relating to an action. For each idiom (target-sentence) we elaborated two brief sentences describing two different contexts (context sentences), in which the action could occur. Thus, we had 20 stimulus pairs. One context sentence in a stimulus pair elicited a literal interpretation of the target-sentence, whereas the other elicited an idiomatic interpretation of the same target-sentence. Each stimulus pair was constructed in such a way that when idiomatic meanings were elicited, verbs literally considered as arm-/hand-action related will describe leg-/ foot-actions. Vice versa, verbs literally considered as leg-/foot-action related will describe arm-/hand-actions in the idiomatic meaning (see Table 1 and 2).
In summary, the same target-sentences could appear two times in each block, but each time it was preceded by a different context-sentence eliciting a different interpretation of its meaning.

Participants
Forty right-handed students (mean age 25 years, range 19-32) entered the validation phase. They were recruited at the University of Parma. All of them were native Italian speakers and reported no history of speaking disorders. They had normal or corrected-to-normal vision. None of them was aware of the purpose of the experiment. The experimental protocols of the validation and of the three experiments were approved by the Ethics Committee of the University of Parma (Comitato Etico Provinciale -AUSL di Parma) and all participants of the experiments gave written informed consent. Experiments were conducted in accordance with the ethical standards of the 1964 Declaration of Helsinki.

Experimental procedure
Stimuli were presented at the centre of a computer screen on a white background. They were written in black lowercase, with Arial font. Participants were presented with 120 trials grouped in 3 blocks. Each stimulus was presented once in a block. Each trial started by presenting a fixation cross at the centre of the screen for 800 ms. Then, a context-sentence was presented for 4000 ms. Finally, a target- sentence was presented for a maximum duration of 3000 ms. The trial stopped when participants pressed a response-key. Stimuli appeared in random order. Participants had to explicitly judge whether the target-sentence described a hand-or a foot-related action based on what they could know from the preceding context sentence. They were instructed to respond by pressing a left or a right previously assigned response key, with their right index finger.
The response keys were counterbalanced between subjects. Each participant was also required to rate each idiom on two rating scales, one for the idiom familiarity (i.e., how much the idiom meaning was known) and one for semantic transparency (i.e., how much the meaning of the words composing each idiom string contributed to the figurative meaning; [55]. In both cases, participants rated each idiom by choosing a number from the 5-point Likert Scale (familiarity rating: 15 not at all, 55 very much; transparency rating; 15not at all, 55 very much). Idioms appeared in random order.

Results
Four participants were removed from the analysis because their error rate exceeded 50% (wrong responses, missing responses). Results for each type of stimulus are reported in table 1. Experimental stimuli were selected according to the following criteria. Accuracy, $70%; familiarity, mean score $3; transparency, mean score $3 [56]. We selected only stimuli satisfying all these three criteria. Accordingly, 6 target-sentences and the associated context sentences were discarded from the list, because they didn't match established criteria (see table 1 and 2). Ratings given on the accuracy, familiarity and transparency of idioms containing a hand action verb do not differ significantly from those containing a foot action verb (Accuracy: hand target-sentences, mean 582.99%, SE50.02 vs foot target-sentences, mean582.65%, SE50.03, t6520.199, p50.848; transparency: hand target-sentences, mean 53.70 SE50.08, vs foot target-sentences, mean53.67, SE50.05; t 9 50.142 p50.15; familiarity: hand target-sentences, mean54.04 SE50.08, vs foot target-sentences, mean53.66, SE50.07; t 9 51.720, p50.119). English translation of stimuli. T5Transparency; F5Familiarity; A5Accuracy. The literal translation of idioms from one language to another is not always possible because not always there is a direct correspondence between idiomatic expressions of different languages. Because the idioms in our experiment are ambiguous, their literal translation is always possible when they are used as literal sentences. However, when they are used as figurative sentences, is not always possible to translate them word by word. For this reason, when a correspondent idiom was not found in English, we decided to translate idioms in figurative sentences according to their inferential meaning. Target sentences in italic are the literal translations of Italian ambiguous idioms used as literal sentences. Target sentences in bold are the translations of Italian ambiguous idioms used as idiomatic sentences.
The validation of the stimuli allowed us to select only those pairs of sentences in which participants drew the expected inference in the interpretation of the target.

Experiment 1
We set this experiment to test if embodied simulation occurring during linguistic processing can be contextually modulated to the extent that the very same sentence, depending on the context of utterance, leads to the activation of different effector-specific areas of the brain.

Procedure
The experiment was carried out in a sound-attenuated and dimly illuminated room. Participants sat comfortably in front of a computer screen at a distance of about 60 cm from it. Linguistic stimuli consisting of 14 Italian ambiguous idioms (see table 1 and 2) were presented in the centre of the computer screen on a white background. They were written in black lowercase using Arial font. The presentation of the stimuli and the recording of the participants' responses were controlled by E-Prime software (Psychology Software Tools, Inc., Sharpsburg, PA). During the experiment each pair of context/target sentences was presented twice. Each of the 14 target-sentences was associated to three different context sentences: 1. Contexts that trigger the literal interpretation of the target; 2. Contexts that trigger the idiomatic interpretation of the target; 3. Contexts unrelated to the target interpretation (e.g. Context: ''The Earth is part of the solar system and revolves around the Sun'', target: ''He gave him an hand'').
The experiment consisted of 84 trials. We had 4 experimental conditions for the go-trials: targets containing an arm/hand related word literally interpreted (Literal Hand; LH); targets containing an arm/hand related word idiomatically interpreted as describing a leg/foot action (Idiomatic Foot; IF); targets containing a leg/foot related word literally interpreted (Literal Foot; LF); targets containing a leg/foot related word idiomatically interpreted as describing an arm/hand action (Idiomatic Hand; IH).
Stimuli appeared in random order. Twelve trials before the Experiment served to familiarize with the task. Participants had to perform 120 trials grouped in 3 blocks. Each stimulus was presented once in a block. Each trial started by presenting a fixation cross at the centre of the screen for 800 ms.
Then, a context-sentence was presented for 4000 ms. Finally, a target-sentence was presented for a maximum duration of 3000 ms. The trial stopped when participants pressed a response-key.
Participants were instructed to carefully read the sentences and to press a red button on a keyboard with their right hand, as fast and accurately as possible, when they judged that there was a relation between the target-sentence and the context-sentence. They had to refrain from responding when they judged that there was no relation of relevancy between the target-sentence and the contextsentence (go-no go paradigm).
Two linear mixed-effects models were separately carried out on mean reaction times (RTs) and Analysis of correct responses (accuracy), with Interpretation of meaning (Idiomatic vs. Literal) and Effector of action (Hand vs. Foot) as fixed factors, and Items as random factors. We adopted the linear mixed effect model because it is a robust analysis that allows to control for the variability of items and subjects [57]. This type of analysis is widely used in language research because it prevents either the potential lack of power and the loss of information due to the prior averaging of the by-subject and by-item analyses [57,58].
Significant differences were explored using Sidak's correction for multiple comparisons.

Results
Two participants were excluded from the analysis as outliers (2.5 SD from the mean of the group). The mean percentage of correct response was 87.8% (SE50.03). Error trials were excluded from further analyses. The sum of correct response and the mean RTs were calculated for each condition; responses either longer or shorter than 2 standard deviations from the individual mean were treated as outliers and not considered (2% of the data set). Based on a further items analysis, we excluded 3 target-sentences and their relative context-sentences from the analysis, because they did not reach the 70% of correct responses ''Continuava a battere sullo stesso tasto''/''He kept beating on the same key'', mean 567.5%, SE57.89 ''saltò il fosso''/''He jumped over the moat'', mean568.7%, SE56.91 ''Si fermò quasi ad ogni passo''/''He stopped at almost every single step'', mean566.2%, SE58.41). We excluded another item (''Tagliò la corda''/He cut the rope) because a deeper examination of the stimuli revealed that this was the only case in which the inferential meaning of the figurative sentence was stable (''To cut the rope'' is always interpreted as meaning ''To run away'') compared to the inferential meaning elicited by the other idioms that is more contextually determined. Also, the same items were not included in the next experiments.
Analysis of accuracy revealed only a significant main effect of Interpretation of Meaning (F 1,377 524.22; p50.000001), as the interpretation of Idiomatic trials was more difficult than the interpretation of Literal trials (mean 581%, SE50.03 vs. mean590% SE50.06 of correct responses, respectively).
In addition, RTs in LH condition were faster than in IF condition (LH: mean 51064 ms, SE567.49 IF: mean 51241 ms SE568.23; p50.001), while this was not the case for foot-related target-sentences.
(LF mean 1276 ms SE567.32; IH condition mean51242 SE568.23; p50.1). This latter result can be interpreted as the product of a facilitation effect due to the interpretation of target sentences in specific contexts. However, it could also be objected that this facilitation effect, rather than being generated by the combination of target and context sentences, could be primarily determined by context-sentences, because these sentences contained action-related words that could induce a pre-activation in the motor system. Experiment 2 was set to rule out this possibility.

Participants
Fifteen right-handed students (mean age 23.5, range 19-29 years) were recruited at the University of Parma for this experiment. All of them were native Italian speakers and reported no history of speech disorders. None of them was aware of the purpose of the experiment. The study was approved by the local ethical committee.

Procedure
In this experiment stimuli and procedure were the same as in Experiment 1 except for target-sentences (see table 3). Each target sentence was replaced by an abstract sentence balanced on the basis of its syntactic structure, number of words and syllables, times and modes of the verb.
Abstract target sentences did not describe any hand or foot actions. Participants were instructed to carefully read the sentences and to press a red button on a keyboard, as fast and accurately as possible, when they judged that there was a relation between the abstract target-sentence and the context sentence. They had to refrain from responding when they judged that there was no relation of relevancy between the target-sentence and the context sentence (go-no go paradigm).
RTs and Accuracy entered in two separated linear mixed-models with Contextactivation (Arm/Hand vs. Leg/Foot) as fixed factors, and Items as random factors.

Results
Trials in which participants failed to respond correctly were excluded from the analysis of RTs (mean56.9%, SE50.02). The mean percentage of correct response was 85.68% (SE50.02). Error trials were excluded from further analyses. The sum of correct response and the mean RTs were calculated for each condition; responses either longer or shorter than 2 standard deviations from the individual mean were treated as outliers and not considered (7% of the data set).
A previous item analysis showed that every item reached at least 70% of correct responses.
Analysis of RTs revealed that they do not differ between context-sentences of these two variables (F 1, 14. In Experiment 1, in which a hand response was requested, a literal hand facilitation effect was found. Analogously, a facilitation effect can be expected for the literal interpretation of target sentences containing foot-related words in comparison to the idiomatic (hand-related) interpretation of the same target sentences if a foot response is requested. Experiment 3 aimed at investigating this issue.  The athlete needed to do the last steps towards the goal It was his moment of glory

Hand
Al pittore mancavano gli ultimi tocchi di colore sulla tela Era soddisfatto di sé stesso The painter needed to do the last brush strokes on the painting He was pleased with himself Foot L'escursionista si trovava in un tratto pieno di ostacoli e pericoloso Stava riflettendo sulle probabilità di riuscita The hiker was on a difficult and dangerous path He was thinking about the chances of success

Hand
Il chirurgo si trovava in una fase dell'operazione delicata e pericolosa Stava apprezzando il fascino delle sfide The surgeon was at a difficult and dangerous stage of the surgery He was enjoying the fascination of challenges Stimuli used are in italic and the corresponding English translations in bold. Note that target sentences in this case are abstract sentences that did not describe any hand or foot actions.

Experiment 3
This experiment was carried out to verify whether a facilitation effect could be found during the literal interpretation of sentences containing foot-related words in comparison to the idiomatic (hand-related) interpretation of the same target sentences if participants responded with the foot. Furthermore, no significant difference in RTs was expected between the literal and idiomatic interpretation of sentences containing hand-related words.

Procedure
The experimental procedure was the same used in the Experiment 1 except for the fact that participants performed foot, rather than hand, responses on the same keyboard used in Experiment 1.

Results
Three participants were discarded from the analysis because their error rates exceeded 30% (missing responses or wrong responses. Trials in which participants failed to respond correctly (3% of data set) were excluded from the analysis of RT). The mean percentage of correct response was 89.38% (SE50.03). Error trials were excluded from further analyses. The sum of correct responses and the mean RTs were calculated for each condition; responses higher or lower than 2 standard deviations from the individual mean were treated as outliers. An Item analysis showed that every item reached at least 60% of correct responses. As in Experiment 1, two linear mixed-effects models were separately carried out on mean RTs and accuracy, with Interpretation of meaning (Idiomatic vs. Literal) and Effector of action (Hand vs. Foot) as fixed factor, and Items as random factors. Significant differences were explored using Sidak's correction for multiple comparisons.
Analysis of accuracy revealed only a significant main effect of Interpretation of Meaning (F 1, 377 523.93; p,0. 01 as the interpretation of Idiomatic trials was more difficult than the interpretation of Literal trials (mean 583%, SE50.03 vs. mean595% SE50.07 of correct responses).
Analysis of RTs showed only a significant Interpretation of meaning by Effector of action significant interaction (F 1, 359 511. 49

Imageability questionnaire
A further questionnaire was administered to exclude that our results could be explained by the degree of imageability of each item [51][52][53][54].
A new group of 38 right-handed students (mean age 26.02 years, range 19-35) was recruited following the same inclusion criteria previously described. They were asked to rate the imageability of each idiomatic/literal sentence presented in Experiment 1 and 3. For the rating we used a 101-point visual analogue scale (VAS), with 0 corresponding to very little and 100 corresponding to very much. Participants were required to evaluate how much they could imagine the action described in the sentence. The questionnaire was administered online using Qualtrics software, Version 37,892 of the Qualtrics Research Suite. Copyright 2014.

Results
Ratings of participants were submitted to the same linear mixed-models used in Experiment 1 and 2. Interpretation of meaning (Idiomatic vs. Literal) and Effector of action (Hand vs. Foot) were fixed factors, whereas Items were random factors. In line with the results from Experiment 1 and 3, the analysis showed the main   4). These results clearly suggest that the degree of imageability alone could not fully explain the facilitation effect for the LH condition (faster RTs) compared to the IF and LF conditions observed in Experiment 1 (hand responses), but absent in Experiment 3 (foot responses).

Discussion
This study aimed to test the hypothesis that motor simulation occurring during linguistic comprehension can be contextually modulated. In order to test this hypothesis we chose a list of 14 Italian ambiguous idioms containing actionrelated words and we predicted that the very same sentence activates in different ways the motor system according to its literal or idiomatic interpretation in the context of utterance. Our results directly support this hypothesis only in relation to the comprehension of hand-related action sentences.
In summary, each target sentence had two different possible interpretations with four factors at play: meaning (literal or idiomatic) and effector (hand or foot). Half of the target sentences contained a hand-related word and could be interpreted both as hand-related literal sentences and foot-related idiomatic sentences. The other half contained a foot-related word and could be interpreted both as foot-related literal sentences and hand-related idiomatic sentences.
When participants responded with the hand (Exp. 1), we observed significantly faster RTs for the literal interpretation of sentences containing hand-related words compared to the idiomatic interpretation of the same sentences. We interpreted these faster RTs as the product of a facilitation effect. In this same experiment, no significant difference in RTs was found between literal and idiomatic interpretation of sentences containing foot-related words. Experiment 2 ruled out the possibility that context-sentences alone, in spite of containing action related words, are sufficient to determine motor facilitation. In fact, in this experiment, the same context-sentences, followed by abstract sentences, did not lead to any differential facilitation of the hand response.
In Experiment 3, participants faced the same task as in Experiment 1 but they had to respond with the foot. A facilitation effect was expected during the literal interpretation of sentences containing foot-related words in comparison to the idiomatic interpretation of the same sentences. No significant difference in RTs was expected between the literal and idiomatic interpretation of sentences containing hand-related words. While this latter result was confirmed, our data did not support the former hypothesis. When responding with the foot, participants showed no facilitation effect during the interpretation of foot-related literal sentences.
Why is the foot different? Or, why is the hand special?
Overall, the data of this study directly support our claim that motor simulation during language comprehension is modulated by the context of utterance only in relation to sentences containing hand-related words. In our study, participants faced a complex pragmatic task while previous studies have been mainly focused on lexical (to understand if a string of letters is a word) or semantic (to make a judgement on the basis of the word meaning) tasks. It was observed that to carry out either a lexical or a semantic task makes a difference in the recruitment of the motor system determining differences in the effects that linguistic processing has on the performance of a concomitant motor act [59]. Also, the extent of the temporal overlapping between linguistic and motor tasks determines another significant difference [10]. It is important to note that we specifically asked the participants of our study to make a judgement of relevancy because we aimed to assess the role of context in the interpretation of target sentences and we needed to be sure that they interpreted target sentences on the basis of context sentences. Previous studies [60][61] have already shown differences between task requirements when participants are asked to make meaningfulness judgements, relevancy judgements or when only reading times are measured. The discussion about the methods to be adopted in the research on the mechanisms underlying language processing is still open. The question is which method best explains the processes involved in language comprehension and which is more similar to real-life situations. We believe that findings obtained from different empirical methods, with different tasks requirements, can converge and complement each other to gain a deeper understanding of the processes involved in language comprehension. In the light of these considerations, we hypothesize that the pragmatic task in our study determined a contextually based interpretation that led to a peculiar modality of recruitment of the motor system. In this case, the recruitment of the motor system could be mediated by activation of prefrontal areas, such as the left dorsolateral prefrontal cortex (DLPF), that have been shown to be involved in idiomatic comprehension and in other linguistic tasks that entail a contextually based semantic disambiguation [62][63]. In a top-down process, the DLPF cortex could select the pertinent motor simulation and, also, the adequate motor program to perform the key-press motor act [64]. However, at the behavioural level, the involvement of the sensorimotor system in language comprehension cannot be directly observed except for the comprehension of literal hand-related sentences. Sato et al. [11,13] suggested that in a semantic task carried out in a delayed condition (when participants perform the key-press motor act 1000 ms after the onset of the linguistic stimulus) no interaction is directly observable between language and the sensorimotor system. Significantly, in our study, RTs were longer than or around 1000 ms in all the conditions. Our peculiar linguistic task very likely determined a ''delay-like'' condition that made it impossible to directly observe the involvement of the sensorimotor system during language processing except for hand-related action verbs.
Why is the foot different? Or, more precisely, why is the hand special? The intimate relationship between language and hand motor control that has been widely demonstrated [65] could be the reason why, in our task, the involvement of the motor system was evident at the behavioural level only during the comprehension of hand-related literal sentences, although a motor simulation likely also took place in the other conditions. Only further investigations with different techniques (e.g. TMS or neuroimaging) will allow us to fully comprehend this issue.

Contextual effect on motor simulation
Recently, neuroimaging studies found somatotopic activation of the pre-motor and primary motor cortical areas related to the literal meaning of action verbs during the comprehension of metaphors and idioms [22][23][24][25]. However, divergent findings have also been obtained in other studies [26,27,56]. Although it is not easy to make a direct comparison between these studies because they differ in many respects, such variability in findings suggests that motor simulation during language processing is not constant. It varies under different experimental conditions, also accordingly to the level of interpretation elicited by the task (lexical, semantic, and pragmatic) and the kind of linguistic stimuli used in the experiment. If motor simulation during linguistic comprehension were automatic and invariant, adopting the classic definition of automaticity, namely that a mechanism is considered automatic if it is independent of top-down control [66], we should expect to always find it. Considering that this is not the case, this variability in itself is a challenge to the invariant and non-mediated nature of motor simulation. Lately, contextual effects on motor simulation during linguistic comprehension have been assessed in behavioural [67] and fMRI studies [68]. These findings suggest that contextual information prevails over semantics. However, how precisely this happens is still an open question.
One hypothesis is that motor simulation, in spite of being automatically triggered by intrinsic semantic features, could be inhibited by the processing of contextual information.
Alternatively, context might act before the onset of any motor simulation associated to linguistic processing determining the selection of the contextually salient pattern of motor activation. Interestingly, these mechanisms are not mutually exclusive. Chersi, Ferrari & Fogassi [69] proposed a computational model of neural chains for action in the parietal lobe in which both inhibition and selection mechanisms are involved. According to this model, when contextual cues are enough to understand the agents' intention, the selection of a specific action goal is expressed by the high activity level of a specific neural chain. Instead, when contextual cues are ambiguous, all intentions compatible with the act are prompted and multiple chains are activated in parallel. As soon as more contextual information will become available, non-compatible neural chains will be inhibited. According to this model, motor simulation during linguistic processing can still be considered automatic. Context is a fundamental part of the construction of meaning and can act by selecting the right neural chain of motor simulation, by inhibiting a wrong one or by using both mechanisms at the same time.

Conclusions
Our data, together with previous findings [48,50,67,68], suggest that motor activation during the processing of action related words is not fixedly associated to the literal meaning of words but depends on the context of utterance. Also, our data support the general claim that automatic mechanisms can be sensitive to the context. This point is particularly important and should be further investigated. Our findings together with previous studies [66] contribute to the discussion on the notion of automaticity. In fact, the classical concept of automaticity is currently under revision and it is now proposed [66] that high-level cognitive mechanisms interacts with automatic processes.
Finally, our data further highlight the intimate relationship between hand and language. Previous studies have widely shown that Broca's area, traditionally considered a language area, is also involved in hand motor control [70,71]. In fact, Broca's area is both involved in tasks such as complex finger movements, mental imagery of grasping actions, and hand-imitation tasks and in syntactic processing involving reconstruction and interpretation of structured sequences of sentences [70,72]. No relation like this exists between language areas and areas that control the movements of the foot. Moreover, it has also been suggested that language evolved exactly from manual gestures [65][66][67][68][69][70][71][72][73]. Many reasons have been proposed to support this evolutionary hypothesis and all of them further highlight this intimate relationship between hand and language.