A true denial or a false confession? Assessing veracity of suspects' statements using MASAM and SVA

Previous research on statement analysis has mainly concerned accounts by witnesses and plaintiffs. In our studies we examined true and false statements as told by offenders. It was hypothesized that SVA and MASAM techniques would enhance the ability to discriminate between true and false offenders' statements. Truthful and deceptive statements (confessions and denials) were collected from Swedish and Polish criminal case files. In Experiment 1, Swedish law students (N = 39) were asked to assess the veracity of statements either after training in and usage of MASAM or without any training and using their own judgements. In Experiment 2, Polish psychology students (N = 34) assessed veracity after training in and usage of either MASAM or SVA or without prior training using their own judgements. The veracity assessments of participants who used MASAM and SVA were significantly more correct than the assessments of participants that used their own judgements. Results show, that trained coders are much better at distinguishing between truths and lies than lay evaluators. There were significant difference between total scores of truthful and false statements for both total SVA and MASAM and it can be concluded that both veracity assessment techniques are useful in assessing veracity. It was also found, that the content criteria most strongly associated with correct assessments were: logical structure, contextual embedding, self—depreciation, volume of statement, contextual setting and descriptions of relations. The results are discussed in relation to statement analysis of offenders' accounts.


Introduction
Enhancing legal actors ability to make correct suspects' statements veracity judgments is of pivotal importance both during criminal investigations and court proceedings. However, previous studies show that overall accuracy of professional criminal investigators', prosecutors', judges' as well as ordinary people's veracity judgments do not exceed 55%, when non specialized deception detection tool is used [1,2,3]  special techniques to enhance legal actors' accuracy of veracity assessments. Thus, being able to create a reliable method for this purpose would not only make investigations and proceedings quicker and cheaper, but also, and what is even more important, more efficient. Speech content analysis is one of the most promising approaches to distinguish between lies and truths, and it already has a long history dating back to around 900 BC [4]. The underlying assumptions in verbal lie detection are that truth tellers exhibit coherence between statement and belief, whereas liars experience a discrepancy between the two [5], that liars have to think harder and that they try more than truth-tellers to make a convincing impression [4], and that people talk in different ways about events which are based upon their experience in comparison to what they only have imagined and fabricated [3,4,5].As a result of previous research there are several known speech content criteria-based techniques available today. However, previous research on statement assessment has focused on mainly one of them, namely the Statement Validity Assessment (SVA) method. Although the SVA method was originally designed to determine the credibility of child witnesses' testimonies in trials for sexual offences it is today the most extensively used worldwide for evaluating the veracity of also adults' testimonies. Also, more than 50 empirical studies about this method have been published up to date, considering mainly adult witnesses' and plaintiffs' accounts [2,5,6,7,8,9]. Multivariable Adults' Statements Assessment Model (MASAM) was more recently designed as a tool for judging the credibility of adult witnesses' statements. Results of previous studies suggest, that MASAM is a useful tool discerning between memories of self-experienced real-life events and fabricated or fictitious accounts [10]. Previous studies on methods, using verbal content criteria have mainly focused on adult witnesses' and plaintiffs' accounts. Only very few studies have explored the possibility of using statement analysis techniques to detect false confessions and denials [11]. Moreover, no reliable data regarding the accuracy of SVA assessments in real-life cases are currently available [4]. The aim of this study was therefore to examine if SVA and/or MASAM can be used in differentiating suspects' true and false accounts.

Statement Validity Assessment and Multivariable Adults' Statements Assessment Model
SVA is a comprehensive procedure for generating and testing hypotheses about the source of a given statement in nine diagnostic steps [12]. The core of the technique is assessment of the presence of 19 CBCA criteria in the transcribed interview [2,3,5,6,7,8]. Each criteria is assumed to occur more frequently in truthful than deceptive accounts. According to the theory on which the selection of the 19 CBCA criteria is based, such criteria are likely to indicate genuine experiences because they are typically too difficult to fabricate [5,12]. Validity Checklist has been developed to explore variables other than truthfulness that may affect veracity of the statement and to consider alternative interpretations of the CBCA outcomes [2,5,12,13].
More than 50 empirical studies and few meta-analysis on SVA have been published to date, mainly with adult participants [2,3,4,5,6,7,8,14,15]. Those studies demonstrate that SVA analyses can be useful for lie detection purposes, since truth-tellers generally obtain significantly higher total scores than liars [16,17,18,19,20] and positive effect sizes have been observed in each and all of the credibility criteria [7,14]. Thus, the studies show that SVA is significantly more accurate than veracity assessment without using specialized tools [2,4,15] and evaluators using the SVA criteria achieve higher hit rates than clinical psychologists assessing reliability of the accounts of physical and psychological symptoms made by clients in legal settings [9]. However, in one of the studies adult truth-tellers obtained lower CBCA scores than liars [4,20]. In the, still very few, studies on suspects' accounts, only some individual CBCA criteria showed differences, that is: self-depreciations and doubts about own testimony were more often present in deceptive accounts, and unexpected complications were more often present in true accounts [11]. Caso et al. [16] found that lies included more description of conversations and subjective mental states, and Lee, Klaver and Hart [21] found that contrary to the SVA assumptions spontaneous corrections occur more often in lies than in truths.
The theoretical framework of MASAM is the same as of SVA, but hypotheses underlying content analysis according to MASAM are complemented with four additional assumptions in comparison to the SVA [10]. First, it is necessary to consider the fact that in each statement, truthful and deceptive, true and false information can be found. Moreover, if a witness' intention is to give untruthful picture of actual events, certain differences appear in form and content of the given statement; the truthful parts will differ from lies. Furthermore, it is necessary to refer to the course of statement formation, that is to analyze circumstances of the event (what happened?), to establish witness' characteristics (who gave the statement?) and to assess the course of the interview (how was the witness interviewed?). In accordance to MASAM assumptions high evidential value of a statement is based on and proven (supported?) by coherence between (1) the content and form of a statement, (2) the object and event features (i.e. complexity and observation conditions), (3) observer's characteristics (i.e. witness' prior experiences, age) and (4) interviewing circumstances (i.e. how detailed was the interview, when was the witness interviewed). It should specifically be mentioned, that, contrary to the CBCA technique, MASAM does not assume that in witness's statement which describes one's experiences, certain elements should be found [22].
There are 21 MASAM content criteria, divided into three categories: general features, details and deposition. Each of the content criteria is assessed separately with respect to object and event features (what happened?), observer's characteristics (who is the suspect?) and interviewing circumstances (how was the witness questioned?). Verbal cues are supposed to assist analysis of respective areas related to formation of statements as well as to be a kind of a guide to direct attention to areas that require more detailed analysis.
Each verbal cue of MASAM is analysed with reference to: what happened, who is the suspect and how was the witness or suspect interviewed? A decision algorithms provides raters with accurate guidelines regarding the way in which the results of content analysis conducted with the use of MASAM criteria should be interpreted [10]. The algorithm has been developed to enhance the accuracy of the method, since the first research, conducted in 2012 [22,23], showed, that MASAM distinguishes adults' truthful statements from lies on the average level of accuracy of 69.22% (72.00% for truthful statements and 54.17% of false statements) when raters only use their own judgment. However, as the MASAM decision algorithms were developed in 2015, the level of accuracy reached almost 99.95% of experience-based statements and 90% of invented statements [10]. Until now, MASAM has not yet been tested on suspects' accounts. SVA and MASAM content criteria are presented in Table 1.

The experiments
The overall purpose of the experiments was to examine whether and to what extent SVA and MASAM can be used to improve accuracy in assessments of the veracity of suspects statement. In line with previous findings regarding both MASAM and SVA's potential in distinguishing between true and false witness statements it was hypothesized that: Hypothesis 1) Veracity assessments using MASAM and SVA are significantly more accurate than veracity assessments that are not based on any of the techniques.
Hypothesis 2) True suspect statements will receive significantly higher overall scores than false suspect statements, using both MASAM and SVA.

Participants
Study participants were in total 39 law students (21 women, 18 men) from Uppsala University, Sweden. The participants were taking a course in Law and Psychology in which statement analysis was part of the curriculum. Decision as to whether or not to take part in this study was completely voluntary. Subjects were informed, that the aim of the research is to study content analysis methods. Participants received no compensation or remuneration for taking part in the research. Participants' ages varied between 23 and 27 years (M = 24.80 years, SD = 1.27). Two thirds of the participants (n = 26) received training in MASAM whereas the other third did not (n = 13). There were no signicicant group differences between trained and untrained raters for age, gender, education or other sample characteristics.

Training
The participants who did receive training took part in a three day course, in total 20 hours, regarding statement analysis. The course included general themes such as definitions of truths and lies, theoretical assumptions of content analysis techniques and practical exercises using the MASAM method for statement analysis. The trainer explained and illustrated each MASAM criterion in depth and also discussed them with the students. Students were encouraged to ask questions and take active part in the training. Completion of the training was a precondition for conducting the analysis of the suspects' statements included in the study. The participants who did not receive any training were encouraged to assess the statements using their own judgements. They did not listen to any lecture, neither regarding lie detection in general nor about specific methods for statement analysis.

Design
The experiment had a 2 (analysis method: MASAM vs. own judgements) x 2 (statement veracity: true vs. false) x 2 (suspect attitude: confession v. denial) mixed design. Whereas the analysis method variable was a between subjects factor, the variables statement veracity and suspect attitude were within subject factors. However, since real suspect statements were used, their veracity could not be manipulated. We discuss the implications of this in Limitations.

Material
The material used for statement analysis consisted of 89 transcripts of suspect interrogations recorded in real life criminal cases. The transcripts were obtained from District Courts as well as Appelate Courts that were asked to provide criminal case files. The case files were carefully reviewed to select appropriate cases in the four following categories: 1) True denials (N = 17), 2) False denials (N = 173), 3) True confessions (N = 170) and 4) False confessions (N = 8). Known cases of for instance false confessions or any of the other categories could not be used since the participants' answers most probably would have been effected by their knowledge of the outcome in the real case. Instead the criteria for inclusion was, in cases of true statements, that there was overwhelming evidence independent of the suspect's statement which corroborated it and, in cases of false statements, that there was overwhelming evidence independent of the suspect's statement which refuted it. This criteria for inclusion seemed reasonable provided that on the one hand, ground truths are usually unobtainable in criminal cases but on the other hand, making correct classifications of the transcripts was crucial for the study's validity. Factors such as the type of crime and the suspect's gender, age and ethnicity varied across the transcripts.

Procedure
All participants received 8 interrogation transcripts to assess. They were instructed to read the transcripts carefully and to not communicate with one another. The deadline for submitting their analyses was one week later. The participants who used MASAM for analysis rated all statements with reference to the individual MASAM-criteria, from which an overall score was calculated. By the end of their assessment, participants also indicated whether they thought the suspect statement was true or false. The participants who did not use MASAM were simply asked to indicate whether they thought the suspect statement was true or false.

Judgment accuracy rates
Analysis of accuracy rates for identifying truthful and untruthful accounts indicated that training significantly affected accuracy. Untrained subjects classified correctly 67of 113 of suspects' accounts (59.29%), including 31 of 58 true statements (53.45%) and 36 of 55 false statements (65.45%). Coders from the Uppsala University using MASAM reached a total accuracy rate of 70.03% (250 out of 357), truthful accounts were correctly classified in 77.53% of the cases (144 among 183) and false statements were correctly classified in 61.22% of the cases (106 among 174). Veracity assessment with the use of MASAM was significantly more accurate than classification without training (χ 2 = 5.98, p < .02), but only for truthful statements (χ 2 = 12.54, p < .01) and not for the false suspects' statements (χ 2 = .317, p = .57).

Differences in MASAM scores between true and false statements
Coders were asked to rate 63 MASAM criteria on 6-point scale (if a criterion was not present in the statement it received a score of 1, and if it was strongly present, it received score of 6), therefore MASAM scores could range from 63 to 378. According to the MASAM assumptions, the higher the overall score, the higher the probability that the statement is experience based and not fabricated. To test our predictions that MASAM overall results will be higher for truthful suspects' accounts we conducted a two (veracity: truthful, false) x two (suspect's attitude: denial, confession) mixed ANOVA. The veracity x suspects' attitude interaction was insignifi- Tukey's HSD test showed that false denials differed significantly from true denials (p < .04) and true confessions (p < .001), but false confessions were not rated significantly different from false denials (p = .99), true denials (p = .47) and true confessions (p = .61).

MASAM individual criteria
To assess the effectiveness of the individual MASAM criteria and test the significance of differences between truthful and false accounts the one-way ANOVA across individual criteria were carried out. Analysis of variance procedure gave statistically significant results for MASAM, F = 2.56, p < .001, η2 = .21. This significant effect of veracity indicates that for at least some individual criteria difference between truths and lies emerged, however chosen predictor explains only small part of the variance. To further explore differences between experiencebased and fabricated suspects' accounts we have tested significance of differences for individual MASAM criteria, which are presented in Table 2.
As can be seen in Table 2 there were significant differences between truths and lies assessed by the University of Uppsala law students for 58 out of 63 MASAM criteria. Effects were small for most of the individual criteria, only for readiness to depose rated with reference to objects and events (p < .01, d = .53) and suspect (p < .01, d = .63), as well as readiness to search, identify and reproduce memory traces rated with reference to objects and events (p < .01, d = .55) and suspect's characteristics (p < .01, d = .56) medium effects were observed.

Participants
Study participants were in total 34 psychology students (30 women, 4 men) from University of Silesia in Katowice, Poland. The participants were taking a course in forensic psychology in which statement analysis was part of the curriculum. Participants' ages varied between 21 and 33 years (M = 22.92 years, SD = 2.13).For each participant decision as to whether or not to take part in this study was completely voluntary. Subjects were informed, that the aim of the research is to study validity and accuracy of content analysis methods. Participants received a consent form (for participation to the study) prior to the study and a debriefing form after the end of the study. Participants received no compensation, remuneration or credit for taking part in the research. At the beginning of the study participants declared number of content analysis they are ready to conduct and were randomly assigned to one of the groups. There were no significant sample characteristics differences between trained and untrained group. Proportions between the groups of raters were set to assure that similar number of ratings from untrained raters, participants using MASAM and subjects using SVA could be collected during the study.

Training
Most of the participants (N = 24) received training in SVA and MASAM, the rest assessed veracity without previous training and not using any of the content analysis technique (N = 10). The participants who did receive training took part in a fourday course, in total 26 hours, regarding statement analysis. The course included general themes such as definitions of truths and lies, theoretical assumptions of content analysis techniques and practical exercises using the SVA and the MASAM method for statement analysis. The trainer explained and illustrated each of the SVA and MASAM criterion in depth and also discussed them with the students. Students were encouraged to ask questions and take active part in the training. Completion of the training was a precondition for conducting the analysis of the suspects' statements included in the study.

Design
The experiment had a 3 (analysis method: SVA vs. MASAM vs. own judgements) x 2 (statement veracity: true vs. false) x 2 (suspect attitude: confession v. denial) mixed design. Whereas the analysis method variable was a between subjects factor, the variables statement veracity and suspect attitude were within subject factors. However, since real suspect statements were used, their veracity could not be manipulated. We discuss the implications of this in Limitations.

Material
Data were transcripts of suspects interrogations conducted in 104 Polish criminal proceedings, made available by the District and Regional Courts of the Silesian voivodeship in response to the researchers' request. Individual files were chosen by courts' authorities and secretarial staff, researchers did not have influence on their quantity and quality. From the court cases only the ones which ended in judgment in force and statements given during the proceedings were evaluated by both instances courts and that evaluations were foundations to conduct factual adjudication of case. Information on evidential material which was collected during the proceedings, including; demonstrative evidence, photographs, medical certificates, experts' opinions, as well as significance of each evidence in terms of courts' ruling and coherence between them and offender's statement content was included in taxonomic sheet. Statements which were recorded in protocols were transcribed and raters received them in the anonymized form. The offenses discussed in the interrogations varied from theft to homicide. Suspects whose statements were subjected to content analysis were in 43.97% women, 56.03% men and aged from 17 to 74 years (M = 36, SD = 15). The content of statement given by a suspect throughout whole proceeding (during from 1 to 7 interrogations, M = 2.51, SD = 1.19). Truthful statements' volume was smaller than untruthful ones, a statistically significant differences was observed between volume (F(1,193) = 50.65, p < .001, η 2 = .0864).Psychology students from the University of Silesia rated confessions with the use of SVA in 141 cases (40.40%) and denials in 208 cases (59.60%), 262 statements were true (75.07%), suspects were apprehended in 64.47% cases (N = 225). For MASAM ratings 138 confessions (41.32%) and 196 denials (58.68%) were used, of which 257 were true (76.94%) and 77 were false (23.06%); suspects were apprehended in 219 cases (65.57%). Untrained participants analyzed 126 confessions (42%) and 174 denials (58%), 228 assessed statements were true (76%) and 72 were false; suspects were apprehended in 102 cases (34%).Since we have gained access to the files chosen and made available by the courts, it was not possible to keep the balance between four categories of statements. As a result, true denials were analyzed 240 times (35.14%), false denials 164 times (24.01%) true confessions were analyzed 279 times (40.85%), and no false confessions were assessed in the study.

Procedure
MASAM and SVA questionnaire used to evaluate statements included a table which presented content criteria of given method with a short characteristic (description of certain criteria meaning). A 6-point scale to evaluate level of coherence between statement and given criteria was used. In our study one indicated that the criterion was not at all present or fulfilled and 6 indicated that the criterion was present or fulfilled to a very high degree. Each of 21 MASAM criteria was rated with the use of six-point scale (in which "1" signifies absolute inconsistency while "6" signifies total coherence) in three areas: coherence with objects and events features, coherence with suspect's characteristics and coherence with interview circumstances. Content analysis of statements with the use of SVA and MASAM was conveyed in reference to all content criteria adequate to the technique. Evaluation included all 30 of SVA and 21 MASAM criteria.
Raters evaluated each criteria individually and added up points gained by each criteria. In final part of the worksheet there were columns which allowed to evaluate evidential value of a statement based on method (true/false). Alternatively, evaluator could assess veracity of the statement independently from content analysis results (true/false) and add some remarks.

Judgment accuracy rates
Analysis of accuracy rates for identifying truthful and untruthful accounts indicated that both training and content analysis technique significantly affected accuracy. Untrained subjects classified correctly 89 of 300 of suspects' accounts (29.67%), including 74 of 228 true statements (32.45%) and 15 of 72 false statements (20.83%). SVA was accurate in 60.17% of the cases (210 of 349), but this technique was more accurate in detecting lies resulting in 68.96% of right classifications (60 of 87), whereas in detecting truths accuracy rate was 57.25% (150 of 262). The average accuracy of raters from the University of Silesia MASAM-trained subjects was 70.66% (236 of 334), for truths 75.48% (194 of 257) and for lies 54.55% (42 of 77). The overall accuracy of MASAM veracity assessment was significantly higher than both: rating with the use of SVA (χ 2 = 8.28, p < .005) and judgment without training (χ 2 = 106.12, p < .001); SVA ratings were significantly more accurate than ratings without the use of content analysis method (χ 2 = 60.31, p < .001). Although veracity assessment of true statements with the use of MASAM was significantly more accurate than classification with the use of SVA (χ 2 = 19.25, p < .001), there were no significant differences between MASAM and SVA when false statements were analyzed (χ 2 = 3.585, p = .0583).

Differences in SVA and MASAM scores between true and false statements
SVA and MASAM overall ratings. To test our predictions that SVA and MASAM overall results will be higher for truthful suspects' accounts we examined overall means. Coders were asked to rate 30 SVA criteria and 63 MASAM criteria on 6-point scale (if a criterion was not present in the statement it received a score of 1, and if it was strongly present, it received score of 6), therefore SVA scores could range from 30 to 180 and MASAM scores could range from 63 to 378. According to the SVA and MASAM assumptions, the higher the overall score, the higher the probability that the statement is experience based and not fabricated.
To test our predictions that MASAM and SVA overall results will be higher for truthful suspects' accounts we conducted a two (veracity: truthful, false) x two (method: denial, confession) and a two (suspect's attitude: confession, denial) mixed ANOVAs (Fig 1A and 1B).  SVA and MASAM individual criteria. To assess the effectiveness of the individual SVA and MASAM criteria and test the significance of differences between truthful and false accounts the one-way ANOVA across individual criteria were carried out. Analysis of variance procedure gave statistically significant results for SVA (F = 3.06, p < .001, η 2 = .23) and MASAM (F = 2.56, p < .001, η 2 = .21). This significant effect of veracity indicates that for at least some individual criteria difference between truths and lies emerged, however chosen predictor explains only small part of the variance. To further explore differences between experience-based and fabricated suspects' accounts we have tested significance of differences for individual SVA and MASAM criteria, which are presented in Table 3.
As can be seen in Table 3 only seven of the thirty SVA criteria were more often present in truthful accounts: logical structure (p < .01, d = .38), contextual embedding(p < .01, d = .46), self depreciation(p < .01, d = .69), pardoning the victim (p < .01, d = .34), details characteristic of the offense (p < .01, d = 0,32), appropriateness of knowledge (p < .05, d = .25) and consistency with the laws of nature (p < .05, d = .27); there were more reproduction of conversation in the false statements (p < .01, d = .41). However, magnitude of Cohen's d indicates only small and medium effects. Table 4 shows differences between truthful and untruthful statements rated with the use of MASAM by the psychology students of the University of Silesia.
Significant but small effects of veracity were found for eleven criteria: internal coherence rated with reference to objects and events (p < .01, d = .35) and suspect (p < .05, d = .27), volume of statement rated with reference to objects and events (p < .05, d = .28) and interview (p < .01, d = .44), contextual setting and external associations rated with reference to objects and events (p < .05, d = .30), descriptions of relations rated with reference to objects and events (p < .01, d = .34), readiness to depose (p < .05, d = .33/.29/.28), readiness to search, identify and reproduce memory traces rated with reference to objects and events (p < .01, d = .41) and interview (p < .01, d = .35) and complement rated with reference to interview (p < .05, d = .26).

Discussion
The main purpose of the present study was to examine if SVA and/or MASAM could be useful in discriminating offenders' true and false statements. With only few exceptions, previous research has not explored the applicability of these techniques on statements by offenders [11]. In most of the previous studies incomplete SVA procedure was used [5] and MASAM validity has not been tested on statements given by suspects [10].
It has been shown, that the total accuracy rates obtained using SVA and MASAM are far higher than those obtained without using a specific assessment system. Trained coders are better at distinguishing between truths and lies than lay evaluators. It is perhaps surprising that after same training law students were more successful than psychology students in distinguishing between truthful and false statements. Our unpredicted result can be explained by differences in how interviews with suspects are recorded-in the Polish protocols only summary of suspect's answers are saved, whereas in the Swedish procedure notation is more detailed and reflects accurately suspect's answers. However, the approximately 70% accuracy rates obtained on the basis of SVA and MASAM ratings, although in line with previous research [2,3] do not suffice for an individual assessment.
The average total accuracy in our study is similar to results reported in research on witnesses' statements and suspects' accounts [2,3,24,25,26,27,28]. The differences in accuracy rates between lie detection without content analysis and SVA/MASAM coders support the idea that trained coders are better at distinguishing between truths and lies than lay evaluators. Results also show, that SVA is a better lie detection tool, than MASAM.
The basic assumption of SVA and MASAM is that a testimony derived from memory of an actual experience differs in content and quality from statement based on fabrication. It was hypothesized that there would be higher SVA and MASAM scores in truthful statements than in false. The overall pattern of results provides support for the validity of the SVA and MASAM techniques for detecting truthfulness of statements. In line with the Undeutsch hypothesis, the results showed a significant difference between total scores of truthful and false statements and it can be concluded that both veracity assessment techniques are useful in assessing veracity. We found that only few of SVA criteria did differentiate between true and false explanations. Seven SVA criteria distinguished truthful and false statements and functioned as we expected: logical structure, contextual embedding, self-depreciation, pardoning the victim, details characteristic of the offense, appropriateness of language and knowledge and consistency with the laws of nature; we have also found more reproduction of conversation in lies than truths. These findings are somewhat in line with previous research [6,7,8]. However, conducting a SVA calls for more information than just the statement. It is assumed, that credibility assessment should be a result of an comprehensive ideographic approach and psychological experts using this technique should have access to all files containing the results of investigations by the police, the prosecutor and the court [5]. Accuracy and reliability of SVA could be also improved by taking into account case-specific conditions [5].
Vast majority of MASAM criteria differentiated between true and false explanations provided by the Swedish suspects with medium and small effect sizes. Only memory losses rated with reference to suspect's characteristic and source of statement evaluated with respect to interview among 63 MASAM criteria failed to discriminate between the memory of real self-experienced events and false or invented accounts. When statements from the Polish case files were analyzed only 10 of MASAM criteria discriminated significantly with medium or small effect sizes. Research has demonstrated that MASAM scores are related to external factors, such as interview style and method of offender's statement recording. Both SVA and MASAM are usually used to assess deception using detailed transcript or video of an interview, and this may therefore explain their apparent limitations when used in the manner adopted here. Assessing veracity of suspects' statements

Limitations
Establishment of ground truth is essential when carrying out deception research [2]. In the present study the ground truth was not objectively established. Although there was a large sample size in this study and court records of each case were thoroughly analyzed to acknowledge that suspect's statement was false or truthful, we cannot rule out that suspects did not tell the truth as it happened and even when confessing had given a sweetened version of the crime. In addition, previous research showed that discriminating efficacy was higher in field studies on sexual offences and intimate partner violence [6] and situational variables such as complexity of the event, time interval between event in question and interview and interview technique may affect veracity assessment outcome [5]. Therefore, there is a need for research focused on differences between true and false accounts made by offenders describing various crimes categories.

Implications and future directions
Taking into consideration the established error rate of approximately 30% and the low effectiveness of the individual criteria as discriminators between self-experienced and invented memories, SVA and MASAM evaluators are not able to present the accuracy of their assessments as being beyond reasonable doubt, which is the standard of proof in criminal courts. Therefore, SVA and MASAM assessments of suspects' accounts are not accurate enough to be presented as scientific evidence in criminal courts.
We found that there were differences in quality and quantity of some individual criteria that distinguished truthful and deceptive statements. Further research can provide more evidence as to the apparent usefulness of the individual SVA and MASAM criteria with respect to individual characteristics of both true and fabricated statements [2].Further exploration of conditions under which low quality is to be found in truthful statements could enhance the possibility of more accurate differentiation between offenders' true and false accounts through a scientifically based technique for statement analysis. Specifically, it seems necessary to further explore and to estimate the impact of personal and situational variables on content quality. Moreover, previous studies indicate that there are qualitative and quantitative differences between true and false confessions [29].Future research is called to explore further differences between true vs false confessions and denials.
In order to standardise content analysis it is necessary to establish rules for using particular tools for analysing verbal cues and evaluators should be provided with clear guidelines concerning result interpretation. Authors of psychological content analysis techniques have not yet been able to develop reliable rules which would allow to determine when a content analysis should result in recognising a statement as experience-based and when it should be classified as false, based on invention or fantasy.