Truth or lie: Exploring the language of deception

Justyna Sarzynska-Wawer; Aleksandra Pawlak; Julia Szymanowska; Krzysztof Hanusz; Aleksander Wawer

doi:10.1371/journal.pone.0281179

Abstract

Lying appears in everyday oral and written communication. As a consequence, detecting it on the basis of linguistic analysis is particularly important. Our study aimed to verify whether the differences between true and false statements in terms of complexity and sentiment that were reported in previous studies can be confirmed using tools dedicated to measuring those factors. Further, we investigated whether linguistic features that differentiate true and false utterances in English—namely utterance length, concreteness, and particular parts-of-speech—are also present in the Polish language. We analyzed nearly 1,500 true and false statements, half of which were transcripts while the other half were written statements. Our results show that false statements are less complex in terms of vocabulary, are more concise and concrete, and have more positive words and fewer negative words. We found no significant differences between spoken and written lies. Using this data, we built classifiers to automatically distinguish true from false utterances, achieving an accuracy of 60%. Our results provide a significant contribution to previous conclusions regarding linguistic deception indicators.

Citation: Sarzynska-Wawer J, Pawlak A, Szymanowska J, Hanusz K, Wawer A (2023) Truth or lie: Exploring the language of deception. PLoS ONE 18(2): e0281179. https://doi.org/10.1371/journal.pone.0281179

Editor: Francesca Peressotti, University of Padova, ITALY

Received: May 2, 2022; Accepted: January 17, 2023; Published: February 2, 2023

Copyright: © 2023 Sarzynska-Wawer et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Full statements, datasets and related scripts for stylometric analysis are available on https://github.com/alexwz/deception-analysis.

Funding: Research was funded by The National Science Centre Poland(https://www.ncn.gov.pl/en), grant number UMO-2017/26/D/HS6/00212 awarded to JS-W. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Lying is a part of everyday human communication, and most of us engage in it—studies show that people tell an average of one to two lies per day [1]. It has been reported that lies are also increasingly common in computer-mediated communication and in text-based interactions [2]. As a consequence, their detection through language analysis is critical in contexts that rely on truthful inputs.

To date, many studies have attempted to capture the differences between true and false statements. These differences may be related to specific types of emotions experienced by liars, cognitive processes occurring while lying, and self-presentation strategies to control behavior by liars [3]. According to the emotional approach, lying can trigger emotions such as excitement, fear, and guilt [4]. They can influence the behavior of a liar and how they speak, e.g., by increasing the use of words with emotional tones or negations [5]. The cognitive approach emphasizes that lying is more cognitively demanding than telling the truth. It requires the involvement of working memory, cognitive control, and shifting attention [6]. This cognitive load may be visible in how a liar speaks (e.g., slower speech pace, mistakes) and also in how they construct false statements, which may be simpler and shorter than true statements. Finally, according to the self-presentation approach proposed by DePaulo et al. [7], liars are less direct than truth-tellers out of concern for their own image; they distance themselves more from the lies they utter and provide fewer details.

These theoretical assumptions have been confirmed in numerous studies, including a meta-analysis conducted by Hauch et al. [8] in which written and transcribed statements from 44 different studies were analyzed. The authors of studies included in the meta-analysis used a variety of software to identify and quantify linguistic cues to deception. The English version of the Linguistic Inquiry and Word Count (LIWC) [9] was the most common. LIWC analyzes transcripts with a dictionary-based approach in which each word is compared against a file of words divided into 74 linguistic dimensions. After counting the number of words in each category, the output is given as a percentage of the total words in the text sample. The results showed that liars express more negative emotion words, distance themselves from events by using fewer self-references (first-person pronouns) and more other-references (second and third-person pronouns), and—as predicted by the cognitive load approach—constructed shorter (fewer words and fewer sentences), less elaborated (fewer different words) and less complex (fewer exclusive words) stories. Some studies included in Hauch’s meta-analysis [8] suggested that liars may also use more negations and over-generalizations. False statements are also less abstract. Newman et al. [10] suggest that due to cognitive load, liars may focus more on simple, concrete verbs rather than evaluations and judgments because the former are more accessible and easier to combine into a false narrative.

Interestingly, different results were obtained in the few studies that analyzed non-English language statements. Schelleman-Offermans and Mercklerbach [11] analyzed statements in Dutch. They failed to confirm differences in complexity between true and false statements as measured by the use of motion verbs. They also found no differences in the use of emotionally negative terms or self-references. False statements differed from true ones only in their less frequent use of exclusive words. Research conducted by Masip et al. [12] on Spanish statements also yielded different results than those repeated in English: true statements differed from false ones only in the use of prepositions (there were more of them in true statements than in false ones) and words related to positive emotions, positive feelings, affective processes, and achievements, which were less common in false statements.

Accordingly, the main goal of our research was to identify differences between true and false statements in the Polish language. Polish is a Slavic language in which grammatical categories are implemented through word inflection, thanks to a rich morphology. This, in turn, allows for a relative freedom in word order and may affect the complexity of statements. Moreover, the Polish language is characterized by the use of reflexive pronouns, which indicate that an action is directed at a performer. Correlations of various part-of-speech classes to either truth or lies have been reported for the English language, with ‘I’ words (self-reference) being found as most relevant to detecting lies. In contrast to English, in Polish self-reference is expressed in either morphology or in part by reflexive pronouns. In this article, we focus on the Polish language and investigate possible differences in the distribution of these properties between true and false statements using automated tools such as part-of-speech taggers and dependency parsers.

In testing the differences between true and false statements, we tried wherever possible to replace dictionary methods with non-dictionary methods and general tools with specific tools tailored to a particular variable. We wanted to move away from dictionary methods primarily because they are developed for a specific language and often lack equivalents for others. We therefore tried to propose methods that can be used independently of language and that are freely available. Where this was not possible, we have chosen dictionary tools developed for a specific variable as they provide better recall and, therefore, are more accurate than general dictionaries.

Below, we describe which differences between true and false statements were analyzed in our research and with which tools. We also state a hypothesis for each variable based on the results of previous research. Hypotheses are posed separately for written and transcribed statements because previous research has shown some differences depending on the type of statement. In writing, liars have more time to plan their utterances and can edit them. In some studies where only written statements were analyzed, this affected—among other things—their length: subjects used more words when lying than when they were telling the truth [13].

Cognitive load approach

Complexity.

Cognitive load may reduce the complexity of false statements, affecting the simplification of both vocabulary and syntax. Therefore, we decided to use three different measures: Mean Dependency Distance (MDD) and Mean Hierarchical Distance (MHD), which are related to syntax, and Gunning Fog Index (FOG), which is related to vocabulary. All three measures are described in detail in the “Method” section.

In general, the differences in sentence characteristics generated under different cognitive loads can be measured by dedicated indexes that are sensitive to changes in load. One such tool is dependency distance—proposed by Heringer [14]—defined as the number of words intervening between two syntactically related words. Dependency distance can measure the memory burden imposed on language processing and reflects the dynamic cognitive load of language generation under various conditions [15]. Using automated dependency parsing, measures such as MDD and MHD offer two different implementations of the concept of dependency distance.

The third measure, namely the Gunning Fog Index (FOG), takes a different approach to estimating text complexity: it computes ratios of complex words in terms of the number of syllables to all words. The FOG is an automated readability index, which translates a text into the number of years of formal education a person needs to understand it the first time they read it. FOG is usually used to confirm that a text can easily be read by the intended audience and has also been used for cognitive load measurements. In deception detection research, it was used by Pérez et al. [16].

MDD and MHD have not yet been used in deception research; we believe that their use in combination with FOG could help measure complexity through readability and syntax, and more accurately determine the nature of complexity differences. Regardless of the type of statement (written/transcribed), we postulate that true statements will be more complex than false statements in terms of all indicators.

Length.

The length of a text may also indicate its complexity, and we used three variables to measure it: the number of sentences, the number of characters used in the statement, and the number of tokens. According to Manning et al. [17], a token is an instance of a sequence of characters in some particular document that are grouped together as a functional semantic unit for processing. Tokens, usually corresponding to words, are the result of splitting an input text into pieces and sometimes discarding certain characters, such as punctuation. These measures have been used in research on lying many times and in several different languages. We assume that in the case of transcribed statements, true statements will be longer than false statements. In the case of written statements, the opposite will be true: false statements will be longer. The differences will be visible for all measures.

Concreteness.

To inspect differences in the concreteness of true and false statements, we decided to use the Linguistic Category Model (LCM) proposed by Semin and Fiedler [18]. This model is based on dividing words into categories reflecting their abstractness level. The authors distinguished four categories, including three types of verbs and adjectives: Descriptive Action Verbs (DAVs), Interpretative Action Verbs (IAVs), State Verbs (SVs), and Adjectives (ADJs). The most abstract category in LCM is adjectives (ADJ), such as “good” or “smart”, which are used to describe highly conceptual dispositions and personality traits. Adjectives enable generalizations across situations, objects, or specific behavioral events.

The most abstract type of verbs are state verbs (SVs), which are mainly used to describe emotional or mental states (e.g., “love”, “think”). They lack a clear beginning and end and make no direct references to a specific behavioral episode or situation, although they do refer to a particular object. The second type of verb is the interpretative action verb (IAV). IAVs describe a general class of behaviors without identifying the specific behavior they refer to in a given context (e.g., “help”). The last and most specific type of verbs are descriptive action verbs (DAV). DAVs are verbs that describe a single and observable event defined by at least one physically invariant feature and with a clear beginning and end (e.g., “call”).

According to Semin and Fiedler [18], differences in the level of language abstractness can be observed even when people describe the same events. The authors claim that due to the development of the LCM itself, we can evaluate the degree to which people reveal their abstract or concrete thoughts in language usage. LCM has so far been used to study the abstractness of deceptive statements. Promising results were obtained by Louwerse et al. [19], who used this tool to predict whether an email consists of fraudulent information.

Again, we believe that this tool potentially provides the most accurate estimate of concreteness, as it covers a much larger number of words than the individual word classes in LIWC or other general dictionaries previously used for this purpose. We assume that true statements, whether written or transcribed, will be more abstract than false ones. The key for us will be the overall abstractness score; we have no directional hypotheses about the occurrence of particular categories of verbs and adjectives.

Emotional approach

Sentiment and negations.

In order to study the use of emotionally charged words by liars, we also decided to choose a different method than LIWC and used sentiment analysis. Sentiment analysis is a technique that uses natural language processing to evaluate a text’s emotional emphasis. The algorithms used in sentiment analysis allow for categorizing expressions/text parts as neutral, positive, or negative. Sentiment analysis is often used in research on reviews, opinions and attitudes, and often consists of counting word occurrences with a positive or negative meaning, such as “beautiful” or “horrible.”

Machine learning and deep learning are most often used for this purpose and achieve high accuracy. Unfortunately, they are usually limited to a specific topic or type of text (domain-dependent). Therefore, in our research, we decided to use a method based on an open-domain sentiment dictionary. Unlike the domain-dependent type, it is more universal and does not categorize entire texts. However, it does allow for an accurate count of positive and negative words appearing in the text.

Sentiment analysis allows us to categorize many more words than is possible with LIWC, making our measurement more accurate. In contrast, we measured the prevalence of negation as has been done in other studies—using a list of words (such as “no” or “never”) and counting the occurrences of these words in the statements. In both written and transcribed sentences, we expect lies to use more negations and words denoting negative emotions and fewer positive emotions.

Self-presentation approach

Part-of-speech and over-generalizations.

We tested whether liars distance themselves from their lies by using fewer first-person pronouns and more third-person pronouns. Since previous studies have found differences in other parts of speech (e.g., participles) and such studies are lacking in Polish, we decided to analyze the frequency of occurrence of many other parts of speech (see S1 File for a complete list). We decided to treat the analysis of parts of speech in an exploratory manner and only hypothesized about pronoun use. Namely, we assume that the differences in the frequency of using first and third-person pronouns shown in English will not occur in Polish, where the personal pronoun is often omitted and the person is included in the form of a verb. These same assumptions apply to written and transcribed statements.

Another way to distance oneself from lies can be to use over-generalizations. We measured them by comparing words from statements with lists of words explicitly created for this variable and by counting the frequency of their occurrence. We hypothesize that there will be more over-generalizations in false written and oral statements than in true statements.

Summary

To summarize, we aimed to check whether the differences between true and false statements found in English studies would also appear in Polish. We considered variables derived from the underlying models of deception, which have been repeatedly considered as predictive of deception. However, our research used newer and more specialized methods than most previous studies. Our study is the first to use tools dedicated to measuring language syntax complexity and sentiment analysis, and also to compare spoken and written statements since we have true and false statements, both written and transcribed from the same individuals. With this data we built a classifier to automatically distinguish true and false statements.

Method

Participants

400 participants aged between 18 and 60 took part in the study (F = 226; M_age = 30.58, SD = 9.63). Overall, 4.5 percent of the participants completed primary school, 46.5 percent had a secondary education, and 49 percent had a higher education. Their native language was Polish. Participants were recruited using social media and internet advertising portals. Each participant received 100 PLN (approximately 25 EUR) for taking part in the study. Ethical approval to conduct the study was obtained from the Committee for Ethics of Scientific Research, Institute of Psychology, Polish Academy of Sciences. All subjects gave their written informed consent, and were made aware that they have the right to quit the study at any moment.

Procedure

In the first step, participants completed a questionnaire about their attitudes towards 12 topics that polarize public opinion. They marked their answers on a Likert scale with response options ranging from -5 “strongly disagree”, 0—“have no opinion”, and 5—“strongly agree.” The experimenter selected the two topics on which each participant marked the most extreme answers (always at least +/-4). Topics included various social, political, economic, and sports issues (for full list see Table 1).

Download:

Table 1.

List of topics: (1) Vaccinations should/should not be compulsory, (2) Polish energy should be based mainly on coal/renewable and non-emission sources, (3) People should/should not eat meat, (4) Smartphones and social media positively/negatively affect interpersonal relationships, (5) Abortion should/should not be legal, (6) God exists/does not exist, (7) Robert Lewandowski is/is not the best Polish football player, (8) Jerzy Zięba’s treatments are/are not effective and help people heal/can harm the sick, (9) Poland should/should not accept more immigrants than today, (10) GM food is/is not safe and useful, and we should/should not invest in these kinds of crops, (11) The political situation in Poland is going in the right/wrong direction, (12) In general, most people can/cannot be trusted, (13) Ewa Chodakowska is/is not the most effective personal trainer in Poland.

https://doi.org/10.1371/journal.pone.0281179.t001

The participants were then asked to generate four statements. Two of them were focused on one topic and were expressed orally and recorded. The other two were written (typed) on an online form. One statement on a particular topic was consistent with each participant’s real position, while the other presented an opposing viewpoint. We gathered 1600 statements in total using this method. These included four from each participant: two real statements (one oral and one written) and two false ones (again one oral and one written).

Participants spoke for at least two minutes during their oral statements, and were provided at least five minutes to give written statements. For each statement, the participant’s task was to present their position and to justify it. Participants were encouraged to present both arguments and verifiable facts, as well as their subjective opinions, experiences and feelings. Participants were told that other people would hear their recordings or read their statements and attempt to guess their true views, and that they should therefore be as persuasive and believable as possible while delivering both communications. The statement order, regarding both the type of communication and their truth/falsehood, was randomized.

Dataset

We analyzed 1498 (760 written statements and 738 transcriptions) out of 1600 statements obtained in the study. 103 statements were excluded from the analysis, including statements made by participants who did not understand the instructions (e.g. they answered truthfully twice or directly admitted to lying), statements that were too short, and written statements consisting of only the sentence equivalents. In the case of some recordings, transcription was not possible or speech was not recorded due to technical problems. An automatic transcription service “Happy Scribe” (https://www.happyscribe.com) was used to transcribe the oral statements. All transcriptions were checked and manually corrected. No modifications were made to the participants’ written statements. Each transcript and each written statement was then saved in a separate text file.

Analysis techniques

Dependency parsing.

To compute syntactic complexity, we employed two measures that use dependency parsing. This approach to the computerized, automatic syntax analysis of sentences in natural language plays an important role in contemporary speech and language processing systems. In dependency parsing, the syntactic structure of a sentence is described in terms of binary grammatical relations that are defined for pairs of words in a sentence. Relations among the words are directed and labeled, leading from heads to dependents. An example of a dependency parse tree is depicted in Fig 1.

Download:

Fig 1. Example dependency tree.

https://doi.org/10.1371/journal.pone.0281179.g001

Mean Dependency Distance (MDD).

The first measure is mean dependency distance (MDD). It was proposed by Liu [15] as a metric for language comprehension difficulty. MDD uses dependency parsing information and the order of words in a sentence. The basic element of the formula is “dependency distance (DD)”, the word-level (surface) distance between the positions of dependent and head vertices (words).

The averaged DDs for all dependency tree pairs make up the MDD. The formula is defined in Eq 1, where n is the number of dependency pairs in a sentence, and DD_i is the absolute value DD of the i-th dependency distance. (1)

DD can be positive or negative, as a dependent word and its head word can be located on different sides with respect to each other. Therefore, MDD is the average value of all pairs of absolute dependency distances.

Mean Hierarchical Distance (MHD).

The second measure uses only dependency tree distances and does not use word position distances as in MDD. The idea behind the MHD [20] is to utilize the distances between each node and the root. Namely, we take the root of a syntactic tree as a reference point and compute the vertical distance between a node and the root, or the path length on the way from the root to a certain node along the dependency edges. This is defined as “hierarchical distance (HD).” The average value of all hierarchical distances in a sentence is the mean hierarchical distance (MHD), as defined in Eq 2. (2)

Example of MDD and MHD.

A dependency parse tree for an input example sentence is presented in Fig 1. We stretched the syntactic structure vertically to better illustrate the concept of hierarchical distances.

Following Eqs 1 and 2, we can calculate the MDD and MHD of the example sentence in Fig 1.

The Mean Dependency Distance is the surface (X-axis in Fig 1) distance between a word and its parent. The word “My” has the word “neighbour” as its dependency parent and the distance between them is 1, as measured on the X-axis. The same situation holds for the words “neighbour” and “has”, hence the distance is 1. The word “black” is between “a” and “cat”, and therefore the distance is 2. The distance from “black” to “cat” is 1. The words “cat” and “has” are two words away (“a” and “black” are between them), hence the distance is 3. The MDD of this sentence is (1 + 1 + 2 + 1 + 3)/5 = 1.6.

The Mean Hierarchical Distance takes into account the vertical distance between each word and the root of the sentence, which is the word “has.” The distance between “my” and “has” is 2 as we move two arrows up, and between “neighbour” and “has” the distance is 1. The distance between “cat” and “has” is also 1. The distance between both “a” and “black” and the word “has” are equal to 2. Therefore, the MHD is computed as (2 + 1 + 1 + 2 + 2)/5 = 1.6.

Contrary to the example shown above, the values for most sentences are usually different. In our study, each sentence from the data set was parsed syntactically using a dependency structure analyzer from the Polish modules of the Spacy tool (https://github.com/ipipan/spacy-pl).

Gunning-Fog Index (FOG).

The Gunning Fog Index is a measure of readability. It estimates the years of formal education a person needs to understand a given text upon first reading. Computing this index begins with taking a sample passage of at least 100-words and counting the number of all words, complex words (those containing three or more syllables), and sentences. The formula is illustrated in Eq 3. (3)

We used the implementation from Textstat (https://pypi.org/project/textstat), a Python library to compute readability, complexity, and grade level statistics from text. Textstat supports the Polish language for the calculation of the Gunning Fog Index.

Linguistic Category Model (LCM).

In our study, we used the Polish LCM dictionary (LCM-PL; [21]), which contains the 6,000 most frequent Polish verbs. For each participant, the number of tokens (an individual occurrence of a linguistic unit) of a given word type (DAVs, IAVs, SVs, ADJs) in the whole statement was added up. According to the LCM formula, the only type of adjectives that are taken into account are those that modify subject nouns. All the remaining ones are skipped. This information has been determined using the dependency parser from Spacy-PL. We calculated the level of abstraction according to the weighted summation formula recommended by the LCM’s authors (DAV + IAV * 2 + SV * 3 + ADJ * 4). We refer to this formula as the general LCM score.

Sentiment.

We used a dictionary of 5421 positive and negative words, obtained from two sources. The dictionary was created manually and represents the sentiment of the most frequent sense of a word, and therefore requires no word sense disambiguation. The first source was plWordNet Emo, which contains 32 thousand lexical units manually annotated with sentiment labels on the level of word senses [22]. Sentiment labels were re-assigned and verified to represent the most frequent sense only. The second source is a manually labeled dictionary of 1774 negative and 1493 positive words, pulled from a rule-based sentiment analyzer [23].

Part-of-speech.

To obtain part-of-speech occurrences, we again used Spacy-PL, which internally uses a bi-LSTM neural network to label the tokens [24]. We used more detailed part-of-speech tags available from the Morfeusz library [25].

Results

We applied hierarchical regression modeling to analyze differences between truthful and deceitful statements in the context of psycholinguistic quantitative variables related to complexity, length of speech, abstractness, sentiment, and part-of-speech (a full list of variables with descriptive statistics is included in S1 Table). Each model included the same set of three predictors: statement type (true vs. lie), statement form (written vs. transcription), and the interaction of those. The predictor related to the statement form (written vs. transcribed) and the interaction of both predictors acted as control predictors. The following sum contrasts were used: true (-0.5) vs. lie (0.5), and written (-0.5) vs. transcription (0.5).

We took into account the by-subject variability when constructing models with varying intercepts. Models were fitted to data using the R package lme4 [26]. Due to multiple comparisons, the analysis used Holm’s correction. A summary of the models is included in S2 Table. Logarithmic transformation was performed for each dependent variable before visualisation and hierarchical regression modeling. Before the transformation, 1 was added to all observations of each variable to avoid zero values. In the case of variables related to abstractness, sentiment, and part-of-speech, the length of the statement was taken into account: the number of occurrences of a given variable was divided by the number of tokens.

Complexity

MDD, MHD, and FOG.

In the case of MDD and MHD, the main effect of statement type (truth/lie) was statistically non-significant, meaning that the tested true statements do not differ from false statements in terms of syntactic complexity. Oral statements were more complex than written statements (MDD:b = 0.13, p = 0.001; MHD: b = 0.12, p = 0.001). Further, the results indicate that participants had a higher FOG score for true statements than in the condition of false statements (b = -0.059; p = 0.004). It was also observed that the FOG score is higher in the transcription condition than in the written statements (b = 0.09, p = 0.001). No interaction effects were observed for any of the variables.