Identifying single-item faked responses in personality tests: A new TF-IDF-based method

doi:10.1371/journal.pone.0272970

Fig 1.

Code Snippet 1.

TF-IDF for single-item faking-detection.

More »

Expand

Table 1.

Number of participants and age and education of participants in the three groups.

More »

Expand

Table 2.

Mean, SD and faking probability of the questions in each honest and dishonest questionnaire in different datasets.

More »

Expand

Table 3.

Unchanged and increased responses by group.

More »

Expand

Fig 2.

Correlation matrices.

Correlation matrices between honest and dishonest responses in the CC, JIS and JIHO datasets respectively, from left to right. Correlation between responses in the honest and dishonest condition is virtually nonexistent, rendering the identification of the honest responses from the dishonest ones a difficult task.

More »

Expand

Table 4.

KL divergence for CC dataset.

More »

Expand

Table 5.

KL divergence for JIS dataset.

More »

Expand

Table 6.

KL divergence for JIHO dataset.

More »

Expand

Fig 3.

Average of TF-IDF response values over all 10 items for honest and dishonest response.

Datasets from left to right: CC, JIS, JIHO. The odds for TF-IDF above 3 are the following: 1/3.8 (for every honest response with a TFIDF >3, there are 3.8 Fakers for CC), 1/6.7 (for JIS), and 1/4.5 (for JIHO). Transforming odds to probabilities yields the following probabilities of dishonesty for TF-IDF >3: 79.2% for CC, 87% for JIS, and 82% for JIHO.

More »

Expand

Fig 4.

Probability density function.

Probability density function (PDF) of the number of dishonest responses per questionnaire (red line) and per-item accuracy for the faking detection task using the TF-IDF model on various datasets: CC (left panel), JIS (central panel), and JIHO (right panel). The most frequent number of dishonest responses is 7/10. In each of the histogram’s bars, we report the average accuracy in identifying faked responses.

More »

Expand

Table 7.

Performance evaluation.

More »

Expand

Fig 5.

TF-IDF response values for participant 5 (dishonest responses) compared to the distributions of TF-IDF values of honest respondents in the CC dataset.

The red dot indicates the items in which the participant faked. In this specific case, only Q6 received the same response in the honest and dishonest conditions. In this specific case, the algorithm accurately signaled seven of the nine dishonest responses. The total accuracy is 8/10. Two dishonest responses were missed.

More »

Expand

Fig 6.

TF-IDF response values for participant 48 (dishonest responses) compared to the distributions of TF-IDF values of honest respondents in the JIS dataset.

The red dots indicate the items in which the participant faked. In this specific case, Q4 and Q6 received the same response in the honest and dishonest conditions (i.e., the subject did not fake). Among the eight dishonest responses, the TF-IDF algorithm accurately spotted six of eight dishonest responses. The total accuracy is 8/10.

More »

Expand

Fig 7.

TF-IDF response values for participant 4 (dishonest responses) compared to the distributions of TF-IDF values of honest respondents in the JIHO dataset.

Here we report a case where the precision is below the reported average. The red dots indicate the items in which the participant faked. In this specific case, Q1, Q2, Q5, and Q6 received the same response in the honest and dishonest conditions (i.e., the subject did not fake). Of the six dishonest responses, two were correctly identified. Q4, Q7, Q9, and Q10 were not reported as faked. The total accuracy is 6/10.

More »

Expand