A lexical decision task for rapid estimation of crystalized vocabulary knowledge in Thai

Graham Pluck; Alexis Sirisomboonwong; Smriti Sitani; Carl Piaf; Suphasiree Chantavarin

doi:10.1371/journal.pone.0348126

Abstract

A useful distinction within cognitive and brain sciences is that between fluid and crystallized ability. Although fluid ability is widely studied, crystalized ability, which draws on acquired, declarative semantic knowledge and the mental lexicon, has been less well studied. Partly this is due to the culture and language specificity of assessment methods. We developed and assessed the psychometric properties of a simple 42-item lexical decision task that could be used with Thai speakers to assess the breadth of their crystalized vocabulary knowledge. A large sample of responses from 662 Thai-speaking participants, collected online, was used to refine the scale through exploratory factor analysis, and establish its internal consistency. A smaller sample of 90 participants was interviewed to establish validity of the task as a measure of Thai vocabulary. Large positive correlations (i.e., > .3) were found between the Thai Lexical Decision Task and measures of verbal fluency, particularly in the first 30 seconds of responding, and with other measures of Thai language skill. Temporal stability of the scale was assessed in a subsample of 27 participants. This confirmed that the Thai Lexical Decision Task has little obvious practice effect and excellent test-retest reliability. We also observed the expected positive associations with age, educational level, and self-reported proficiency in the language, supporting the ability of the task to measure acquired, crystalized knowledge. This new task could be used to estimate cognitive ability in Thai adults and shows potential as a measure of premorbid cognitive function for use in neuropsychological assessments.

Citation: Pluck G, Sirisomboonwong A, Sitani S, Piaf C, Chantavarin S (2026) A lexical decision task for rapid estimation of crystalized vocabulary knowledge in Thai. PLoS One 21(5): e0348126. https://doi.org/10.1371/journal.pone.0348126

Editor: Yiu-Kei Tsang, Hong Kong Baptist University, HONG KONG

Received: March 31, 2025; Accepted: April 9, 2026; Published: May 4, 2026

Copyright: © 2026 Pluck et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The Thai Lexical Decision Task is free to use in research without cost, and can be downloaded from: https://gpluck.co.uk/Tests/ The data files can be found on this repository: https://doi.org/10.23668/psycharchives.16526.

Funding: This project is supported by Ratchadaphiseksomphot Fund, Chulalongkorn University (Cognition, Audition and Language (CoALa) Research Group).

Competing interests: The authors declare that no competing interests exist.

Introduction

A basic distinction, widely accepted in the various cognitive and neural sciences, is between active, ‘fluid’ cognitive-control ability, and ‘crystallized’ ability using stored knowledge, particularly related to semantic memory. The most well-known version is Cattel’s distinction between fluid and crystallized intelligence [1], but it is an integral component to many other approaches. One notable example is Baddeley’s multi-component working memory model [2], and another comes from intelligence testing, in which the distinction between ‘performance’ and ‘verbal’ IQ is equivalent to fluid and crystalized [3]. The fluid/crystalized contrast is useful, as the two factors have very different features, particularly in their relationships with formal education [4], developmental trajectories [5], and neurological disorders [6,7]. These are due to the learnt nature of crystalized ability, which increases with age and educational experience, and is quite resilient in the presence of normal and pathological cognitive ageing [5]. Nevertheless, few standardized tests of crystallized ability are available, even in English, as most psychometric studies have focused on fluid abilities such as non-verbal intelligence, working memory, and attention.

Fluid ability is characterized by active processing for immediate behavior, such as spatial navigation and decision making; it is very closely associated with complex problem solving [8] and executive functions [9] and may be independent of language skill [10]. In contrast, crystalized ability is closely linked with declarative semantic knowledge, vocabulary, and language skill [11]. However, crystallized ability is more than stored knowledge; it is also linked to intelligence and includes flexible thinking, particularly creativity [12]. It is likely that both executive ‘fluid’ and semantic ‘crystalized’ control both contribute to performance in a range of cognitive tasks, such as cognitive estimation [13] and verbal fluency [14], but at the neural level, they are dissociable [15].

Crystalized ability encompasses much about declarative memory, particularly semantic memory, including various dimensions of general knowledge [16]. Declarative semantic memory also includes the mental lexicon, the set of links between conceptual meaning, and, at its most basic form, the phonemes that indicate those meanings [17]. The different aspects of factual knowledge and the lexicon are very closely associated [11].

Methods of assessment of crystalized ability

Assessment of a person’s lexicon via their vocabulary knowledge is the most common method of measuring their crystalized ability. This is because within linguistic cultures, there is a common consensus over the meanings of a finite number of words. By measuring performance on a task of the knowledge of a sample of words, one has an idea of the breadth of the mental lexicon on an individual.

One very common method that psychologists use to achieve this is through asking for definitions of words. An example would be a participant being asked “what does the word ‘nationality’ mean?”, and the researcher/clinician scoring the response for accuracy in terms of commonly understood meanings of the word. Such tests are included in many intelligence test batteries, such as the Weschler family of tests [3]. These tests, though, as acts of communication between the participant and researcher/clinician, also likely assess non-lexical processes, such as fluency, mentalizing skill, and social communication adeptness. A method that reduces the social and subjective aspects of performance is to use multiple-choice assessments of vocabulary, based on matching words to phrases that are synonyms [18].

An alternative is to use lexical decision tasks. In such tasks, real words are presented with pseudowords (i.e., words that appear to be legitimate words in the language but are not actually used in the language). One method, widely-used for language assessment in educational contexts, is to present a mixed array of words and pseudowords, and the test-taker must choose which they know, and which they do not. Such recognition yes/no tests have been widely-used for student placement within a language education context, because, despite their simplicity, achievement scores correlate very highly with scores on more comprehensive and lengthy language-skill assessments [19]. They also appear to have favorable psychometric properties when compared to multiple-choice vocabulary assessment methods [20]. The success of those lexical-decision studies led to the development of LexTALE, a simple lexical-decision based method for assessing vocabulary acquisition in a range of second-language contexts [21]. In addition to the English version of LexTALE, there are versions available for several different languages including French [22], German, and Dutch [21]. Although designed for use in second-language assessment, these tests often do not have ceiling effects in first-language speakers, suggesting they may have use in other contexts. Lexical decision is also one component of the DIALANG system, also used to assess competence of teenagers and adults learning a range of second languages [23].

The first such standardized test of this type for psychological research purposes may have been the German-language Mehrfachwahl-Wortschatz-Intelligenztest B (multiple-choice vocabulary intelligence test B; [24]). In that test, German words must be identified from sets composed of one real word plus four pseudowords. In the English language, the Spot-the-Word test [25,26], originally part of the commercially sold Speed and Capacity of Language Processing Test [27], is the most applied lexical decision task for cognitive assessments. In that test, pairs of words, such as ‘Puma - Laptess’, are shown to the participant and they must choose the real word from each pair. In this case ‘Puma’ would be the correct choice, as ‘Laptess’ is a pseudoword.

Lexical decision has several benefits as a testing method. On the part of the participant, it does not require articulation, and there is an option to guess, thus avoiding potential embarrassment from being asked to define words one does not understand. On the part of the researcher/clinician, scoring does not require listening or reading, and so can be relatively automated, allowing for online or group test administration. It is also objectively scored, eliminating any variation in scores due to imperfect interrater reliability.

A further benefit is that when lexical knowledge is to be used as a measure of crystalized ability, it is useful to minimize any fluid processing aspects of task performance. The latter of these requires some level of top-down executive control, shown by its sensitivity to cognitive load [28], while recognition has a substantial element of implicit memory, and is generally unaffected by cognitive load [29]. This is underlined by second language acquisition studies that show that vocabulary recognition is generally easier than recall [30], and is a better predictor of reading comprehension because performance of the task requires a narrower range of cognitive processes, offering a purer performance measure [31]. From a neuropsychological perspective, a reason for this dissociation is that lexical decision tasks can be performed without access to conceptual knowledge; to respond correctly it is enough to have a feeling that a word is real, that one has seen that word before, even if one is unable to locate the meaning of the word [25].

Nevertheless, current models of lexical decision task performance suggest that multiple sources of information are used to reach a decision, and this includes how ‘word-like’ the pseudowords seem [32]. It is therefore important to minimize such information, by providing only convincing pseudowords.

Availability of validated lexical decision tasks

Few standardized lexical decision tasks are available for clinical or research use in psychology. One reason for this is that they are, by their nature, language and culture specific. The Spot-the-Word Test was developed in the UK and uses British English words, indeed some words (e.g., ‘octaroon’), though listed in British dictionaries, are not listed in dictionaries from the USA, casting some doubt on its validity within that country. And of course, such tests are not valid for people who speak languages other than English. Consequently, localized versions need to be developed.

Lexical decision tasks very similar to the Spot-the-Word Test have been developed for use in several languages, including Spanish [33], and Brazilian Portuguese [34]. There is also a version in Swedish that uses a different format but is still a lexical-decision task [35]. However, no standardized test of crystalized ability exists for clinical or psychological research use in the Thai language. Development of such a task would assist future research in psychology and the behavioral sciences within the country. Here, we describe studies of the validity and reliability of a Thai Lexical Decision Task. The target population is adults who speak Thai as their main language. Study 1 reports the development of the task and the use of factor analysis with a large online sample to refine the scale and to establish its internal consistency. Study 2 aimed to establish the validity of the task by investigating performance correlations between the lexical decision task and measures of Thai verbal fluency. Finally, Study 3 aimed to assess the test-retest reliability of the scale in a subsample of participants.

Study 1: Task development and factor analysis with an online sample

Method

Aims, study design and hypotheses.

The aim of this study was to pilot an initial set of words and pseudowords to explore item difficulty and scale unidimensionality with a large sample of participants. An online study was conducted with an initial version of the Thai Lexical Decision Task. We also hypothesized that as a measure of acquired crystalized knowledge, scores on the task would be positively associated with age, education level, and self-rated language proficiency.

Participants.

A total of 992 participants were recruited online (i.e., completed at least the consent stage) from the Chulalongkorn University community. Although these were recruited through convenience sampling, this approach is useful as it allows recruitment from a set geographical area, and convenience samples can be appropriate when used to pilot assessments and generate material for further studies [36], as is being done in the current study. The inclusion criteria to participate were that an individual was aged 18 or above, a Thai national, and was a fluent native Thai speaker (exposed to Thai language before the age of 5). The recruitment period for the whole study reported in this paper was between March 13, 2023 to March 8, 2024.

The mean age of the whole sample was 32.6 years (SD = 7.1, range = 18–79), and most identified as female, 641/992 (65%). However, many cases were excluded during data quality processing, the actual analyzed sample is thus described later.

Development of the initial version of the Thai Lexical Decision Task.

To develop a large set of target words for the Thai Lexical Decision Task, we initially selected 100 real Thai words. All words were selected to be concrete concepts that were imageable; the relevance of this feature is described later. Since participant performance is evaluated based on their accuracy rather than their response times, the task had to be designed to be moderately difficult. Thus, we purposely selected some words that had low frequency of use (e.g., ชลมารค [royal barge], ผอบ [casket], ปฏัก [goad stick]) so that these words will not be easily distinguishable from nonwords. We also included colloquial words that were relatively common and familiar in everyday conversation (e.g., จราจร [traffic], ศีรษะ [head], ปีศาจ [ghost]). In terms of etymology and register, most of the selected words were literary Thai lexemes of Sanskrit/Pali origin, as well as a few royal vocabulary, archaic poetic words, and formal words (see Table 2). These items were intentionally skewed towards orthographically complex Indic-derived words drawn primarily from non-colloquial registers in order to avoid ceiling effects and ensure sufficient variability in accuracy and response times across participants. The frequency of the selected words within the Thai language varied from < 1 to 70.2 per million (M = 5.9, SD = 13.7), estimated using the Thai web 2018 (thTenTen18) corpus from the Sketch Engine website (https://www.sketchengine.eu/). The word length ranged from 2–7 characters (M = 4.1, SD = 1.2), defined as the number of consonant and vowel characters excluding the vowels and tone marks located above or below those characters. The number of syllables ranged from 1–4 (M = 2.2, SD = 0.7). Research assistants looked up each word in search engines and dictionaries to verify that each of the words had meaning in the Thai language.

A list of 100 pseudowords was created to be paired with the real words. These were produced by a Thai doctoral level psycholinguist, without the use of AI. The pseudowords were designed to look and sound like plausible Thai words without violating spelling conventions; all were pronounceable and orthotactically legal, but did not correspond to any attested Thai words. Many of the pseudowords were formed by combining syllables or orthographic segments from existing Thai words, e.g., จุติ and หรรษ์ are attested words in Thai, but their combination จุติหรรษ์ is not a word in Thai. Some of these were created using grapheme segments that resemble real morphemes in Thai but contain consonant or vowel variants, e.g., จรรญาพรม, ทัศภาติ, กรรโดษฐ์, ภริเมษฐา, ศยามาร. Others were short, monosyllabic pseudowords which were generated to preserve typical Thai syllable structure without directly recombining identifiable morphemes, e.g., เฌบง, กรวก, ขวัง, ตจุลย์, ครวด. The pseudowords contained a similar set of letters as the real words, containing high-frequency letters (e.g., ก, ช, ส) as well as lower-frequency letters (e.g., ณ, ฐ, ษ), combined in different ways. The length of the pseudowords ranged from 2–8 characters (M = 4.2, SD= 1.2), and the approximate number of syllables ranged from 1–4 (M = 2.0, SD= 0.8). Research assistants verified through online searches that the pseudowords did not have a conventional meaning in Thai language.

The real words and pseudowords were randomly paired to create the 100 test items. Since the random pairing resulted in some very easy items (e.g., a high-frequency real word was paired with a pseudoword that contained low-frequency letters, thus providing a cue that it was a pseudoword), according to pilot data, we manually switched some pairings to avoid creating real word – pseudoword pairs that were too easy. For task performance, the variable of interest is accuracy, that is, whether the correct word in each pair is selected or not.

Procedure.

The procedure, including written consent taking, had approval of the local research ethics committee (Study Title No. 660010; Approved by the Research Ethics Review Committee at Chulalongkorn University on March 9, 2023) and the research was performed in compliance with all relevant laws and institutional guidelines. The initial version of the Lexical Decision Task was placed online using Qualtrics. The online task with 100 real word – pseudoword pairs was promoted on a social media account of the Faculty of Psychology at Chulalongkorn University, Bangkok. When interested individuals clicked on the link, they were taken to the Qualtrics site and the research was explained to them in text form, at which point they provided consent to continue by checking a box. Initial additional questions in the survey asked about the participants’ (a) age, (b) gender, (c) highest educational attainment, (d) exposure to Thai language before the age of 5 (options being ‘no exposure’, ‘Thai only’, and ‘Thai and other languages’), (e) and a five-point self-rating of proficiency in Thai language from ‘very low (1)’ to ‘very high (5)’.

Next, participants read the instructions specifying that they will see pairs of words comprising of a real word and a pseudoword that does not have a meaning in Thai language, and that their task was to select the word in each pair that was a real word. An example item and correct response was provided. The main task was presented with pairs of real and pseudowords displayed in the standard default font in a legible 17 pt size, with all 100 items appearing on the same page on the screen. The locations of the real words and pseudowords were randomized so that the real words/ pseudowords did not always appear consistently on either the left or right side. At the end of the task, participants were told how many they had answered correctly. This feedback element appeared to make the task challenging and the initial social media link was shared many times, resulting in the task being attempted 989 times over a two-day period. At this point the task was taken offline.

Results and discussion of the online study

Data processing.

The initial data set from Qualtrics contained 992 completed consents, and 989 participants proceeded to respond to questions. We performed data cleaning to improve the quality of the dataset by removing cases with incomplete data. We also removed cases completed while outside of Thailand, based on IP-addresses (as these may have included non-native Thai speakers), and subsequent attempts from the same IP address (as these may have indicated people completing the task more than once). Some attempts had been completed in an unfeasibly short period of time, a behavior associated with low data quality [37]. Other attempts had taken an unreasonably long time to complete and may indicate participants researching the words before answering, such as checking in dictionaries. Cut offs of < 180 seconds and > 900 seconds for task completion were used as reasonable criteria for removing temporally invalid records. Of the retained cases, the mean completion time was just under seven minutes (412 seconds), equivalent to completing the task by spending less than 4 seconds on each choice made. Using combined geographical filtering and IP address is an effective method for reducing fraudulent data produced by bots or multiple attempts by the same individual [38].

Demographics of the retained sample.

Of the retained sample of 662 participants, the mean age was 32.6 years (SD = 6.8). Two-thirds (66%) of the sample identified as female, 154/662 as male (23%) and the remainder (11%) identified as ‘other’. Self-reported education level was generally high, likely a consequence of the online task being promoted from a university social media account. Only 28/662 participants (4%) reported no formal education beyond high school. The majority, 353/662 (53%) reported undergraduate education, and 278/662 (42%) reported graduate level education. The sample, therefore, overrepresents the most highly educated section of Thai society. All participants reported exposure to Thai language before the age of five, but 232 (35%) of them indicated that they were exposed to both Thai and other languages during that period. Demographic variables are summarized in Table 1.

Download:

Table 1. Summary of demographic characteristics of the study participants.

https://doi.org/10.1371/journal.pone.0348126.t001

Analysis of individual items and factor analysis.

In the next stage of analysis, the accuracy of performance from the 662 participants was examined on an item-by-item basis. One item (#53) was found to have very low accuracy, and on checking, despite being included in the initial task, the ‘real’ word was not found in any Thai dictionaries. Consequently, responses to that item were removed from the data set. Several other items appeared to produce very little variance in response, with correct responses from the majority of participants. Insufficient item difficulty is potential challenge to validity and should be addressed in test development [39]. We examined this in both the full sample of 662 participants, and the subsample of 28 participants with low education levels (high school or lower). Items were removed from the data set if accuracy in the full sample was > 97.5% (i.e., more than 645 of the 662 participants answered correctly). However, items were retained if scored as < 94% correct in the lower-education subsample (i.e., at least two of the 28 participants made an error). This allowed us to retain items that may be challenging to less educated participants. This resulted in the removal of 31 further items. This pruning was necessary as items with very low variance would not be psychometrically useful, in that low variance within items tends to reduce scale reliability [40], and also would not be appropriate for factor analyses. There were also 6 items that were answered at < 50% accuracy and were removed at this stage. As the chance guessing rate is 50%, they may indicate items in which the pseudoword was perceived as more word-like than the real word. As such, they reflect the interpretation of the pseudoword more than the target word and lack face validity as a test of vocabulary. Together, this totaled removal of 38 of the original items, leaving 62 items with a reasonable level of variance.

Next, the factor structure of the data set from the 662 participants on the 62 lexical-decision items was examined. Because the responses to individual items by individual participants have Boolean values expressing whether they were answered correctly or not (1 = true, 0 = false), Pearson correlations cannot be used in exploratory factor analysis. Instead, polychoric correlations are recommended [41]. The use of polychoric correlations for the Boolean data of observed variables assumes that there is a continuous and normally distributed scale underlying each of the observed variables. This assumption is reasonable in the sense that the vocabulary knowledge of individuals is widely conceptualized as existing on a continuum (e.g., [42]). Polychoric correlation coefficients range from 0 to 1, in which the values closer to 0 indicate weaker association and those closer to 1 indicate stronger association.

Next, a 62 x 62 matrix of the polychoric correlation values was submitted to exploratory factor analysis. The estimation method used was the iterated principal factor method with the prior communality for each variable set to the squared multiple correlation with all other variables. In addition, the number of factors to be extracted was set to one because all the items are intended to measure only the breadth of the mental lexicon. The scree plot of eigenvalues confirmed the specification of a single factor. Finally, it is thought that loadings < .3 do not meet the minimal level for inclusion within a factor [43]. With that threshold, 20 additional items were removed from the scale. Consequently, the final Thai Lexical Decision Task had 42 remaining items. Those items, along with their factor loadings, are shown in Table 2. Twenty of these 42 words were relatively high-frequency words (defined as 1 word per million and above; M = 4.82, SD = 6.04, range = 1.01–23.3), and the average number of characters in this set was 3.35 (SD = 0.88, range = 2–5 characters). The remaining words were relatively low-frequency words (defined as lower than 1 word per million; M = 0.19, SD = 0.27, range = 0–0.99), and the average number of characters in this set was 4.64 (SD = 1.09, range 3–7 characters).

Download:

Table 2. Items included in the final version of the Thai Lexical Decision Task based on analyses of the online study (n = 662).

https://doi.org/10.1371/journal.pone.0348126.t002

Final version of the Thai Lexical Decision Task.

Those 42 items were used to calculate a Thai Lexical Decision Task total score by summation. Psychometric properties of that total score are shown in Table 3. The internal consistency of the whole scale was estimated with Kuder-Richardson 20 (KR-20) which is equivalent to Cronbach’s alpha, but used with dichotomous scores [44]. This was found to indicate ‘good’ internal consistency. Skewness and kurtosis are not included in the table as it also contains data from a separate, smaller sample (described in Study 2), and ways to interpret non-normality vary by sample sizes [45]. However, in the current (online) sample the absolute skewness value of −2.24 suggests substantial negative skewing of the data distribution, which is also shown by the central tendency (mean = 38, median = 39) being close to the maximum (42). This likely reflects the high education level of the sample. The distribution was also highly leptokurtic, with an absolute value of 10.1, which would also indicate non-normal distribution, probably also linked to the highly skewed education level of the sample. Given the highly non-normal distribution, non-parametric correlations were used to examine associations with demographic factors. Gender was dichotomized (male or female). Thai Lexical Decision Task total scores had moderate associations with education level and self-reported Thai proficiency, qualitatively ‘typical’ positive associations, and a qualitatively ‘typical’ to ‘large’ positive association with age. Qualitative descriptions of correlation coefficients used in this paper are based on a standardized interpretation [46,47]: r > .1 = ‘small’, r > .2 = ‘typical’, and r > .3 = ‘large’). The scatterplots for those associations are shown in Fig 1. There was a very small, but significant, zero-order correlation of task performance with gender, indicating slightly better performance by female participants. Of the 20 words that we classified as high frequency, the mean recognition score was 18.3 (SD = 2.0, 92% correct), which is very similar to the score for the 22 low frequency words which had a mean recognition score of 20.01 (SD = 2.2, 91% correct). That small difference was not statistically significant (t(661) = 1.90, p = . 058 (two-sided), d = 0.074).

Download:

Table 3. Psychometric properties and correlates of the Thai Lexical Decision Task (42 items) score distributions from the online and interview samples.

https://doi.org/10.1371/journal.pone.0348126.t003

Download:

Fig 1. Scatterplots of scores on the Thai Lexical Decision Task with demographic variables.

https://doi.org/10.1371/journal.pone.0348126.g001

Summary of the online study.

To summarize, the findings from the online study were used to prune an original set of 100 items to a set likely to perform well as a single scale of crystalized vocabulary knowledge. Although the sample used was large, it also contained an overrepresentation of highly educated participants. Nevertheless, we were able to select a set of 42 items which were challenging to many participants and loaded onto a single factor, assumed to represent vocabulary knowledge. One question raised is whether this scale will perform well when applied to a more educationally diverse sample; that issue is dealt with in the next studies. Although the devised 42-item scale has obvious face validity as a measure of vocabulary knowledge in Thai, its psychometric validity remains to be demonstrated formally. Thus, a further question addressed in the next study is whether scores on the Thai Lexical Decision Task are associated with performance on more-established vocabulary assessments.