Predicting risk of dyslexia with an online gamified test

Luz Rello; Ricardo Baeza-Yates; Abdullah Ali; Jeffrey P. Bigham; Miquel Serra

doi:10.1371/journal.pone.0241687

Abstract

Dyslexia is a specific learning disorder related to school failure. Detection is both crucial and challenging, especially in languages with transparent orthographies, such as Spanish. To make detecting dyslexia easier, we designed an online gamified test and a predictive machine learning model. In a study with more than 3,600 participants, our model correctly detected over 80% of the participants with dyslexia. To check the robustness of the method we tested our method using a new data set with over 1,300 participants with age customized tests in a different environment -a tablet instead of a desktop computer- reaching a recall of over 78% for the class with dyslexia for children 12 years old or older. Our work shows that dyslexia can be screened using a machine learning approach. An online screening tool in Spanish based on our methods has already been used by more than 200,000 people.

Citation: Rello L, Baeza-Yates R, Ali A, Bigham JP, Serra M (2020) Predicting risk of dyslexia with an online gamified test. PLoS ONE 15(12): e0241687. https://doi.org/10.1371/journal.pone.0241687

Editor: Athanassios Protopapas, University of Oslo, NORWAY

Received: June 7, 2019; Accepted: October 20, 2020; Published: December 2, 2020

Copyright: © 2020 Rello et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data sets reported in this work are archived and freely accessible at https://doi.org/10.34740/kaggle/dsv/1617514.

Funding: Financial support was provided by a grant from the US Department of 249 Education NIDRR (grant number H133A130057, J.B., https://www.ed.gov/); and a 250 grant from the National Science Foundation (grant number IIS-1618784, J.B. and L.R., 251 https://www.nsf.gov/).

Competing interests: The overall test design with a different classifier is protected by the following patent: Luz Rello and Miguel Ballesteros, Data Processing System to Detect Neurodevelopmental-Specific Learning Disorders. U.S. Patent Application 15/493,060, filed April 20, 2017. Assignee: Carnegie Mellon University (Pittsburgh, PA, US). The method has a patent filed (5/493,060) and we have made our data available in Kaggle. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Introduction

More than 10% of the world population has a specific learning disability with neurobiological origin called dyslexia. According to the International Dyslexia Association, “dyslexia is characterized by difficulties with accurate and/or fluent word recognition and by poor spelling and decoding abilities. These difficulties typically result from a deficit in the phonological component of language that is often unexpected in relation to other cognitive abilities and the provision of effective classroom instruction” [1]. If someone knows they have dyslexia, they can learn coping strategies to overcome its negative effects [2, 3]. However, when people with dyslexia are not provided with appropriate support, they often fail in school: 35% of drop out of school, and it is estimated that less than 2% of people with dyslexia will complete a four year college degree [4].

Detecting dyslexia is especially difficult in languages with transparent orthographies, such as Spanish. In such languages, the correspondence of grapheme (letter) and phoneme (sound) is more consistent than in languages with deep orthographies, such as English, where people with dyslexia struggle more in learning how to read [5, 6] and thus dyslexia is easier to detect. Therefore, dyslexia is called a “hidden disability” due to the difficulty of its diagnosis in languages with transparent orthographies because manifestations of dyslexia are less severe [5, 6]. As a result, Spanish speakers primarily learn that they might have dyslexia through school failure, which is often too late for effective intervention. Current methods of diagnosis and screening require professionals to collect performance measures related to reading and writing via a lengthy in-person test [7–9], measuring, e.g., reading speed (words per minute), reading errors, writing errors, reading words, pseudo-word reading, reading fluency or text comprehension.

While machine learning techniques are broadly used in medical diagnosis [10], in the case of dyslexia it has been only used in combination with eye-tracking measures [11, 12]. The scope of this study is to determine whether people with and without dyslexia can be screened by using machine learning with input data from the interaction measures when being exposed to gamified linguistic questions through an online test, so it is easier to administer.

Materials and method

Method

We designed 32 linguistic exercises appropriate for inclusion into a web-based gamified test and conducted a study with 3,644 participants (392 with professional dyslexia diagnosis). Using a within-subject experimental design, we collected numerous performance measures during test completion.

Content design

We designed the gamified exercises using two methods. First, some exercises were based on an empirical analyses of a corpus or errors written in Spanish by people with dyslexia [13] because errors reflect specific difficulties that comprise dyslexia [14, 15]: we annotated the mistakes with general linguistic characteristics as well as with phonetic and visual information [13]. We then analyzed the mistakes and extracted statistical patterns to later use in the creation of the test questions. Examples of these patterns are found in the most frequent linguistic and visual features shared in the errors which are phonetically and visually motivated. For instance, the most frequent errors involve letters in which the one-to-one correspondence between graphemes (letters) and phonemes (sounds) is not maintained, such as (<b, v>, <g, j>, <c, z>, <c, s>, <r>) and the letter <h>, which, in most cases, does not have a phonetic realization in Spanish. Another example of this phonological motivation found in errors is that mistakes involving vowel substitutions take place between phonemes that share one or two phonetic features, with lip rounding being the most frequently involved in errors (<a, e>). On the other hand, the visual motivation is demonstrated in that 46.91% of the error letters occur with mirror letters, i.e., <p> and <q> or <n> and <u> [13].

Second, we designed test exercises to target specific cognitive processes, different types of knowledge, and difficulties entailed in reading [16–18]. Each exercise addresses three or more of the following dyslexia-related indicators shown in Table 1 that are different types of Language Skills, Working Memory, Executive Functions and Perceptual Processes.

Download:

Table 1. Cognitive indicators used in the creation of test exercises.

https://doi.org/10.1371/journal.pone.0241687.t001

The language of the exercises was reviewed by five speech therapists from Spain, Chile and Argentina, to guarantee that the Spanish variant presented in the exercise was neutral. To ensure that the pronunciation was performed correctly, the voices in the exercises were recorded by a professional voice actress. Likewise, to ensure that the difficulty level was appropriate, each question was reviewed by the speech therapists. See Fig 1 for an example of the exercises layout.

Download:

Fig 1. Examples of four test questions: Find ‘d’ among ‘b’, ‘p’, and ‘q (top left); build a correct word (‘nadie’, ‘nobody’) by substituting one letter (top right); re-order the letter to write a correct (‘siete’, ‘seven’) (bottom left); and find the word ‘boda’ (‘wedding’) (bottom right).

The instructions of the game were given via prerecorded voice prompts.

https://doi.org/10.1371/journal.pone.0241687.g001

Questions 1-21 (Q1-Q21) entangled auditory and visual discrimination and categorization of different linguistics elements (phonemes -sounds-, graphemes -letters-, syllables, words, pseudo-words). As the level increases, the elements are harder distinguish, because they are phonetically and orthographically more similar. The questions of the test were presented in increasing order of complexity and were intended for children seven years old or older.

Previous work [14, 17, 18] has shown that people with dyslexia have difficulty recognizing their own reading and spelling errors, including insertion, deletion, substitution or transposition of letters and syllables as well as detecting syntactic and semantic errors in sentences, that is, errors in the structure or in the sentence meaning. Hence, exercises 22-29 (Q22-Q29) focus on correcting words and sentences by fixing the type of errors found in texts written by people with dyslexia. For instance, the user is asked to re-order the letters in the common error *‘seite’ to form the correct word ‘siete’‘seven’ (See Fig 1). These exercises target lexical knowledge, word identification, reading comprehension, and other linguistics skills such as phonological, syntactic and semantic awareness.

The final exercises (Q30-Q32) target sequential visual and auditory working memory by asking the player to write sequences of letters in an specific order as well as words and pseudo-words. A more detailed explanation of each question can be found in the Data sets section.

The test was implemented in HTML5, CSS and Javascript with a back-end PHP server and a MySQL database.

Participants

Children and adults with dyslexia were recruited through a public call to dyslexia centers and dyslexia associations; the inclusion criteria specified that participants should have a dyslexia diagnosis performed by a registered professional. Participants without dyslexia were recruited through schools and limited to children and adults who had never had language problems in school records. Determining accurate ground truth in dyslexia diagnosis is difficult precisely because many people are never diagnosed and we do not know the ground truth accuracy of the professional diagnoses. All the participants’ native language was Spanish.

The participants with dyslexia consisted of 392 people (45.2% female, 54.8% male). Their ages ranged from 7 to 17 (M = 10.90, SD = 2.58). The group of participants without dyslexia was composed of 3,252 people (49.7% female, 50.3% male), ages ranging from 7 to 17 (M = 10.44, SD = 2.46).

Dependent measures

To quantify task performance, we collected the following dependent measures for each question: (i) number of Clicks; (ii) number of correct answers (Hits); (iii) number of incorrect answers (Misses); (iv) Score defined as sum of Hits per set of exercises; (v) Accuracy, defined as the number of Hits divided by the number of Clicks; (vi) Missrate, defined as the number of Misses divided by the number of Clicks.

We later used these performance measures together with the demographic data as features of our prediction model’s data set, see section Data sets.

Compliance and ethics statements

Interested organizations responded to our public calls, and, those where we verified that met the participation requirements (age, mother languages, technical requirements and dyslexia diagnosis for the experimental group) were included. Overall, 113 organizations from Argentina, Chile, Colombia, Spain, and USA participated in the study: 3 universities, 60 schools including primary and secondary, 22 specialized centers that support people with dyslexia, and 18 non-profit organizations compose of 4 foundations and 14 associations of dyslexia in Hispanic countries. Most organizations included both, dyslexia and non-dyslexia subjects. For each of the organizations there was one or more supervisors who were trained to administer the study protocol.

Procedure

This study was approved by the Carnegie Mellon University Institutional Review Board (IRB). First, participants gave their written on-line consent. In case the participant was under-aged we gathered consents from the participants and their parents. Then, the participant or the supervisor –in case the participant was under-aged– filled out a demographic questionnaire, including the date of their dyslexia diagnosis (if any). Next, they were given instructions on how to fulfill the tasks and completed the study: they completed the gamified test for 15 minutes without interruption, since each item of the test has a fixed time. Supervisors could not help participants complete the test using a desktop computer. For schools, parental consent was obtained in advance and the study was supervised by the school counselor or the therapist. All participants and supervisors were volunteers.

Data sets

We had 3,644 participants with an age range of 7 to 17 years old, where 392 (10.7%) had diagnosed dyslexia.

In Table 2 we show the characteristics of the overall data set as well as the characteristics of age-based subsets.

Download:

Table 2. Characteristics of the data sets (age range).

https://doi.org/10.1371/journal.pone.0241687.t002

The data for each participant consisted of a total of 196 features: Features from 1 to 4 correspond to demographic features, while features from 5 to 196 to the performance features, derived from their interaction while playing the 32 questions of the test (6 measures per question presented previously, that is, Clicks, Hits, Misses, Score, Accuracy, and Missrate). Following, we describe them in detail.

1. Gender of the participant, a binary feature with two values: Female and Male.
2. Native language of the participant, a binary feature with two values: Yes if their native language was Spanish and No if they were bilingual, being one the languages Spanish.
3. Language subject. This is a binary feature with two values: Yes when the participant had fail a language subject at school at least once and No when the participant had never fail that subject.
4. Age of the participant ranging from 7 to 17 years old.
5-28. These features correspond to questions from 1 to 4 (Q1-Q4). In these tasks the participant hears the name of a letter (e, g, b, and d) and maps it with the letter among distractors (orthographic and phonetically similar letters) within a time frame, using a Whac-A-Mole game interaction. These questions address prerequisites in reading acquisition: Alphabetic Awareness, Phonological Awareness and Visual discrimination and categorization.
29-58. Features targeting Phonological Awareness, Syllabic Awareness and Auditory Discrimination and Categorization. Here the players hear the pronunciation of syllable (ba, gar, pla, bla or glis corresponding to Q5, Q6, Q7, Q8 and Q9, respectively) and map it with its spelling, e.g. glis where the distractors are glir, glin, gris, gril, grin, glim, gles, grel, glil and glen.
59-82. Features corresponding to a set of exercises (Q10-Q13) where participants map a pronunciation of word with its spelling (e.g. boda) discriminating among other phonetically and orthographically similar words and/or pseudo-words e.g. boba, boca, boga, bola, bota, baba, beba, deba, tuba, buba, suba, loba or coba. These features aim at Lexical Awareness, Auditory Working Memory, and Auditory Discrimination and Categorization.
83-106. These performance features (Q14-Q17) target mainly Visual Discrimination and Categorization, and Executive Functions since the players undertake a visual search task, finding as many as possible different letters with-in a time frame, e.g. E/F, g/q, u/n c/o, b/d, d/p, b/q, among others.
107-130. Features extracted from a set of exercises (Q18-Q21) where players listen to a pseudo-word and choose its spelling (e.g. pamata) among (e.g. mapata, matapa, pamada, mapaba, madata, damata, pamama, and mamata). These features target Visual Working Memory, Sequential Auditory Working Memory, and Auditory Discrimination and Categorization.
131-142. These target mainly Lexical, Phonological, and Orthographic Awareness; extracted from exercises where participants need to fill the missing letter in a word, i.e. ha_e for hace (Q22, features 131-136) or delete the extra letter in the word (Q23, features 137-142) i.e. feiria for feria.
143-148. These performance features (Q24) mainly target Morphological and Semantic Awareness. They are collected from exercises were participants find a morphological error in a sentence -which gives as a result a semantic error-. For instance, in the sentence Voy a la pastelería a *comparar un pastel. (‘I go to the bakery to *compare cake’), where the word comparar (‘to compare’) should be comprar (‘to by’).
149-154. Features related (Q25) to Syntactic Awareness. Similarly to the previous set of exercises, participants need to find and error in a sentence, being this error in a grammatical or functional word, so the Syntactic meaning of the sentence change. e.g. al (‘at’) instead of la (‘the’) in Está la final de la sala (’There is *the end of the room’).
155-160. A set of features (Q26) related to Phonological, Lexical and, Orthographic Awareness since they are extracted from exercises where participant ween to find an error in a word, i.e. *egemplo, and correct it ejemplo (‘example’) choosing a letter from a set of distractors j, n, d, and b.
161-172. Here participants are asked to rearrange these letters to spell a real word (Q27, for features 161-166), e.g. ‘s’, ‘e’, ‘i’, ‘t’, ‘e’ to build siete (‘seven’) Phonological, Lexical and Orthographic Awareness or to rearrange these syllables to spell a real word (Q28, for features: 167-172), e.g. ‘ra’,‘do’,‘mo’ to build morado (‘purple’) Syllabic, Lexical and Orthographic Awareness.
173-178. These features (Q29) address Phonological, Lexical and Orthographic Awareness derived from exercises where players separate the words to make a meaningful sentence, e.g., Hoycumploveintidósaños to Hoy cumplo veintidós años (‘I’m twenty-two today’).
179-184. This set of features (Q30) target Sequential Visual Working Memory since they are gathered from exercises were players see for 3 seconds a sequence of letters (<i u a>, <p g d j>, <v h b z q>, and <M D J N P H>) and then write then discriminating the targets from the distractors Visual discrimination and categorization.
185-196. These features are derived from dictation tasks where participant listen and write four words (Q31) e.g., principio (Lexical, Orthographic Awareness and Auditory Working Memory) and four pseudo-words (Q32) e.g., danama Sequential Auditory Working Memory and Phonological Awareness.

Since all the exercises involve attention all the performance features [5-196] target the Executive Functions of Activation and Attention, and Sustained Attention. In addition some of them (Q24-Q26) also target Simultaneous Attention when the participant pays a attention from a number of sources of incoming information at the same time, e.g., word recognition, distractor discrimination and error recognition.

Results

Predictive model

For the predictive model, we used Random Forests [19] due to their flexibility, non-linearity and good level of interpretability, as this technique is based in decision trees with bagging. We used the Weka 3.8.3 implementation with 200 trees and unlimited height. Moreover, Random Forests are one of the most successful algorithms for many practical applications -such as genomics or agriculture among others- due to their properties. For instance, Random Forests have been found to be consistent, adapt to sparsity, do not over-fit, and reduce the variance, while not increasing the bias of the predictions [20].

For all cases we used weighted attributes to balance the dyslexia and non-dyslexia classes as a trivial classifier (that predicts that everyone does not have dyslexia) would have obtained an accuracy of 89.2%, since 89.2% of the participants do not have dyslexia.

For the evaluation we used a 10-fold cross validation. That is, we divide the data in 10 random groups (6 of size 364 and 4 of size 365) and then we use 9 of them for training and the last one for validation, repeating this 10 times changing the validation set. This is much better than a 10% single held-out subset, as we average 10 different partitions instead of just one.

As our goal is to have high recall (sensitivity) for the dyslexia class, we choose the Random Forest voting decision threshold such that the weight of the false negatives is similar to the weight of the false positives. This implies that we give between 8 to 9 times more importance to not send a child with dyslexia to the specialist than sending a child without dyslexia to the specialist. This implies that the threshold will be much less than 0.5, which is the default value. We discuss this issue in the next section. In Table 3 we give the confusion matrix of the best model for the overall data set.

Download:

Table 3. Confusion matrix for the main predictive model.

https://doi.org/10.1371/journal.pone.0241687.t003

To understand how the complexity of the prediction task changed with younger children and less data, we partitioned the data set in 5 other subsets having and not having overlap. In Table 4 we give for all data sets the combined accuracy for both classes, the recall and precision for the dyslexia class, the ROC and the threshold used for the Random Forest to obtain these results. In Fig 2 we show the accuracy, the ROC (i.e., the Receiver Operating Characteristic), and the predictive power (percentage of accuracy per 1,000 people), which shows that about 1,500 participants are enough to reach the accuracy plateau. We can also see that the best result is obtained for the 9-11 age range, which is probably the most homogeneous.

Download:

Fig 2. Accuracy, ROC, and predictive power for the different data sets.

https://doi.org/10.1371/journal.pone.0241687.g002

Download:

Table 4. Results for the different data sets.

https://doi.org/10.1371/journal.pone.0241687.t004

We also trained classifiers for only the female and male participants finding that with a 5-fold evaluation (validation sets of 20%), the results were very similar. They are also shown in Table 4.

Deploying the model

In practice what is important is not the accuracy but the recall (or sensitivity) of the dyslexia class (or the complement of this which is the rate of false positives, that is, the fraction of people without dyslexia that are predicted to have dyslexia) and the rate of false negatives (that is, the fraction of people with dyslexia predicted as without having dyslexia, which is the complement of the specificity), as the impact of each type of error is different. Indeed, missing a child that may have dyslexia is much worse than sending a child without dyslexia to a specialist.

If the whole population would take the test and the rate of both errors is similar, false positives would be 9 times more frequent than false negatives if we assume a 10% prevalence. However, in practice, only people that have learning problems takes the test, and we estimate that they are about 20% of the population [21]. In this case, not only the rate but also the number of false negatives and positives would be similar. Hence, we decided to set the threshold for the model when both types of errors have the rate as similar as possible (0.24 for the main predictive model). Our estimation that 20% of the people who take the test having dyslexia has been proven realistic, as 51% of the people taking the test are predicted to have risk of dyslexia, which implies a prevalence of 10.2%. In Table 5, we show the precision and recall results per class while in Fig 3, we show the precision-recall graph for the dyslexia class, where the point for the threshold of 0.24 is shown with an X.

Download:

Fig 3. Precision and recall curve for the dyslexia class, varying the model threshold.

https://doi.org/10.1371/journal.pone.0241687.g003

Download:

Table 5. Model precision and recall per class for a threshold of 0.24.

https://doi.org/10.1371/journal.pone.0241687.t005

Optimization

In spite that, as mentioned before, we designed our models to avoid over-fitting, we did an extra experiment tuning two parameters of the Random Forest: the depth of the tree and mtry, i.e., the number of features randomly sampled at every stage. Fig 4 plots the ROC depending on the depth of the tree from 5 to 100 levels and mtry for four values between 5 and 14 where 8 is the default value in Weka and 14 is the default value in R (this value depends on the number of features in both cases). As we can see at depth 20 the result does not improve any longer and even using mtry 14 only improves the ROC marginally. In fact, a model using depth 20 and mtry 14 only obtains a ROC of 0.875 with an accuracy of 79.8% and sensitivity as well as precision of 79.8% for a threshold of 0.245. This reaffirms that there is no over-fitting, as expected.

Download:

Fig 4. ROC in function of two Random Forest parameters for the main model.

https://doi.org/10.1371/journal.pone.0241687.g004

Discussion

In this section, we explore the impact of the different features and discuss the model’s limitations.

Feature analysis

To analyze which were the best features in our models, we used standard information gain in decision trees. For example, for the main model (A1), the two most important features were gender and the performance in Spanish classes, which makes sense given that dyslexia is more salient in males and people with dyslexia fail at school. The next 44 best features were related to some question in the test. However, as questions are atomic, we need to aggregate all the features per question. Table 6 give the normalized importance (100 the top one) of them, where we also aggregated all the demographic features. Notice that all questions discriminate and just using the top features does not give good results. For example, using the top 7 questions plus the demographic variables, the accuracy obtained is just 70.9%. In Table 7, we aggregate features by type, where we can see that, successes are slightly better than mistakes.

Download:

Table 6. Relative question importance based on feature analysis.

https://doi.org/10.1371/journal.pone.0241687.t006

Download:

Table 7. Relative importance by feature type aggregation.

https://doi.org/10.1371/journal.pone.0241687.t007

The most informative features were the set of features coming from the first set of nine questions (Q1-Q9). This is coherent with the design of the test since we placed the questions which were most linguistically motivated at the beginning of the test. Questions 1 to 9 target the basic prerequisites for reading acquisition, such as Phonological Awareness, Alphabetic Awareness, Syllabic Awareness as well as Auditory and Visual discrimination and categorization. These features come from exercises where the participant was required to associate a letter name, a letter sound or a syllable with its corresponding spelling. This is consistent with previous literature on dyslexia that specifically focus on the deficit on the phonological component in dyslexia [1, 5].

Regarding the aggregated performance measures (Table 7), all are highly predictive since all of them address manifestations of the participant’s performance. The feature Hits is the most predictive one, most likely because the people with dyslexia perform worse.

Customized test with new participants

After these first results, we analyzed how appropriate were the level of the questions for the participants age and how difficult was the test’s user interface. As a result, we adapted the test for different age ranges since some questions were too difficult for the younger ages. Hence, in the revised test there are 3 ranges of ages (i) from 7 to 8 years old (327 children) with 19 questions (118 features coming from Q1-Q12, Q14-Q17, Q22-Q23 and Q30); (ii) from 9 to 11 years old (567 children) with 27 questions (166 features coming from Q1-Q20, Q22-Q24, Q26-Q28 and Q30); and (iii) from 12 to 17 years old (498 children) with 31 questions (190 features coming from Q1-Q28 and Q30-Q32). We removed question 31 because the user interaction needed to solve the exercise (cut the sentences into words) was understood differently among participants, i.e. some used clicks while others dragged the mouse across the words, leading to inconsistent features.

To test the robustness of the method we collected a new data set using a different device, a tablet. This new data set was composed of 1,395 new participants where 10,6% had diagnosed dyslexia, applying one of the tests above depending on the age of the participant. We used the same procedure and inclusion criteria of the main study, where ten new schools participated in the study and the ages of the participants also ranged from 7 to 17. The participants without dyslexia consisted of 1,247 people (M = 10.75 years old, SD = 2.46) being 50.04% female and 49.96% male. The group of participants with dyslexia was composed of 148 people (M = 9.61 years old, SD = 2.11), where 51.4% were female and 48.6% male.

Table 8 shows the data set characteristics and the results for the three different age groups. As we can see, for children 12 years old or older, we obtain over 78% sensitivity in spite of using a different device, less questions (features) and less training data. For children between 9 and 11 we obtain almost 76% and for 7 and 8 only 72%. We can increase the recall in the dyslexia class by decreasing the acceptance threshold of the model having a trade-off similar to the Fig 3.

Download:

Table 8. Results for the tablet test.

https://doi.org/10.1371/journal.pone.0241687.t008

Limitations

Our machine learning model trained from human-computer interaction data is able to classify people as having dyslexia or not with high sensitivity, and using this type of data to screen dyslexia is novel. However, it indirectly considers measures that have previously used in traditional diagnoses. Indeed, paper based tests use reading and writing performance measures such as reading speed, spelling errors, and text comprehension [7–9], and the measures gathered with our online test indirectly measures such user’s performance when the participant is exposed to the linguistic questions.

Even if dyslexia is well known to be associated with slow reading in transparent orthographies, we have not used direct indexes of reading speed because the screening test is focused on reading prerequisites, such as phonological awareness, and also, because of gamification reasons due to the online setting. This screener aims to be a first accessible step to later be complemented by other tests that already successfully address reading speed and comprehension in detail [7–9].

Nevertheless, the results of this online test should be taken as screening only and cannot serve as a diagnosis due to at least three reasons. First, our online screening test does not take into consideration other factors and tests that might be relevant for a comprehensive assessment leading to diagnosis.

Second, our test does not discriminate other conditions. It is increasingly recognized that dyslexia co-occurs with other disorders [22]. For instance, dyslexia is often co-morbid with dyscalculia [23] and attention deficit hyperactivity disorder (ADHD) [24]. Notably, 40% of the people with dyslexia have dyscalculia [25], and from 18 to 42% of the population with dyslexia also have attention deficit hyperactivity disorder (ADHD) [24]. Also, there are other language disorders, such as specific language impairment (SLI), that require professional assessment. These comorbidities make professional diagnoses a more challenging task, and, in practice, sometimes dyslexia is misdiagnosed by ADHD and vice versa [26, 27]. In our approach we took as ground truth the current dyslexia diagnosis accessed by a professional, however, that ground truth could vary depending on the professional assessment. Furthermore, there can be other factors that can play a role such as fatigue and concentration.

Finally, our test cannot report different degrees of dyslexia and does not consider the personal history of the user which can also play a role on dyslexia diagnosis.

Conclusions

The approach presented in this article shows that dyslexia can be screened in a language with shallow orthography, such as Spanish, using machine learning in combination with measures derived from a 15 minutes long gamified online test. However, the results of this approach should be taken as a screening test in practice, never as a dyslexia diagnosis, since there are other factors such as intelligence quotient and dyslexia comorbidities that needs professional oversight.

This approach of screening dyslexia is easy to take on the Web, since it does not require special equipment. So far, the preliminary results of this screener were published in [28] and it later was deployed as an open access on-line tool used already more than 200,000 times in Spanish speaking countries. Since estimations of dyslexia are much higher than the actual diagnosed population, we believe this method has potential to make a significant social impact. Similar methods could lead to earlier detection of dyslexia and prevent children from being diagnosed with dyslexia only after they fail in school.

Nevertheless, we need to carry out further experiments with the new tests for tablets as well as to collect larger data sets for building more accurate models.

Acknowledgments

We thank the volunteers that participated in this study, the voice actress Nikki García, and the speech therapists who reviewed the test, Alicia Bailey Garrido, Daniel Cubilla Bonnetier, Nancy Cushen-White, Ruth Rozensztejn, and Daniela Sánchez Alarcón.

As well as the professionals from the educational centers who helped reviewing some cases of the ground-truth from the data set: Ángeles Álvarez-Cedrón Angela Biola Quintana Pérez, Mireia Centeno, Patricia Clemente, Pilar Del Valle Sanz, Esther Gamiz, Paloma García Rodríguez, José Manuel González Sanz, Cristina Martín, Miguel Ángel Matute, and Ana Olivares Valencia.

We thank the specialized centres for providing participants with diagnosed dyslexia: Academia Eklekticos, ADAH SLP Vigo, Aprèn+, Atenea Psicosalud & Psicoeducativo, CAMINS Logopèdia, Psicologia i Dificultats d’Aprenentatge, Centre Elisenda Currià, Centre Espais, Centre Neureduca, Claudia Squella, CREIX Centre d’Assessorament Psicopedagògic Barcelona, CREIX—Centro de Desarrollo Infantil Mallorca, Didàctica-Rubí, Educasapiens, Engracia Rodríguez-López Domingo, Gabinete Psicología Marta Pellejero Escobedo, Isabel Barros, Logopèdics Lleida, Novacadèmia from Barcelona, Sant Feliu de Codines and Caldes de Montbui, Tangram Barcelona, Uditta, UTAE (Unitat de Trastorn de l’Aprenentatge Escolar), and Valley Speech Language and Learning Center Texas.

We thank the following non-profit organizations for providing participants and spreading the call for participation: ADA Dislexia Aragón, Adixmur, Asociación ACNIDA, Asociación Catalana de la Dislexia, Associació de Dislèxia Lleida, COPOE (Confederación de Organizaciones de Psicopedagogía y Orientación de España), COPOE (Orientación y Educación Madrid), Disfam, Disfam Argentina, Dislexia & Dispraxia Argentina, Fundació Mirades Educatives. Fundación Educere, Fundación Marillac, Fundación Valsé, and Madrid con la Dislexia.

We are also very grateful to the schools and the universities that participated in the main study: CEIP Bisbe Climent, CEIP Foro Romano, CEIP Juan XXIII, CEIP Los Ángeles, CEIP Maestro Juan de Ávila, CEIP Ntra. Sra. de la Salud, CEIP Nuestra Señora de los Ángeles, CEIP San José de Calasanz Fraga, CEIP San José de Calasanz Getafe, CEPA Ignacio Zuloaga Helduen Heziketa Iraunkorra, CES Vega Media, Colegio Adventista Rigel, Colegio Alborada, Colegio Américo Vespucio, Colegio Areteia, Colegio Concertado Bilingüe Divina Providencia, Colegio de Fomento Las Tablas-Valverde, Colegio de las Hermanas de la Caridad de Santa Ana, Colegio Gimnasio los Pinares, Colegio Hijas de San José, Colegio La Milagrosa, Colegio Madre Paulina de Chiguayante, Colegio María Auxiliadora de Alicante, Colegio María Auxiliadora de Sepúlveda, Colegio María Auxiliadora de Sueca, Colegio María Auxiliadora de Terrasa, Colegio María Auxiliadora de Torrent, Colegio María Auxiliadora de Valencia, Colegio María Auxiliadora de Zaragoza, Colegio María Inmaculada de Concepción, Colegio María Moliner, Colegio Matilde Huici Navas, Colegio Miguel Servet, Colegio Nuestra Señora de la Soledad, Colegio Obispo Perelló, Colegio Rural Agrupado Tres Riberas, Colegio San Gabriel, Colegio Santa Ana Fraga, Colegio Santa Ana Zaragoza, Colegio Santa Dorotea, Colegio Santa María del Pilar Marianistas de Zaragoza, Colegio Virgen de la Peña, CPI Castroverde, Escola 4 Vents, Escola Comptes de Torregrossa, Escola Les Cometes, Escola Mare de Dèu del Priorat, Escola Pepa Colomer, Escola Sol Ixent, GSD Guadarrama, IES Azuer, IES Bajo Cinca, IES Ben Gabirol, IES Corona de Aragón en Zaragoza, IES do Camiño, IES Leonardo de Chabacier, IES Puerta del Andévalo, IES Ramón J. Sender, IES Rey Fernando VI, IES de Bocairent, University of Valencia, University Don Bosco, and University San Jorge.

Finally, we thank the schools who participated in the second user study: Centro Infanta Leonor, Colegio Sagrado Corazón, Colegio San Antonio, Colegio San Patricio, Colegio San Prudencio, Colegio Santo Domingo, Colegio Urkique, Colegio Vizcaya, Escolapios de Getafe, and Escuelas Bosque.

References

1. Lyon GR, Shaywitz SE, Shaywitz BA. A definition of dyslexia. Annals of Dyslexia. 2003;53(1):1–14.
- View Article
- Google Scholar
2. Krafnick AJ, Flowers DL, Napoliello EM, Eden GF. Gray matter volume changes following reading intervention in dyslexic children. Neuroimage. 2011;57(3):733–741. pmid:21029785
- View Article
- PubMed/NCBI
- Google Scholar
3. Gabrieli JD. Dyslexia: a new synergy between education and cognitive neuroscience. Science. 2009;325(5938):280–283. pmid:19608907
- View Article
- PubMed/NCBI
- Google Scholar
4. Al-Lamki L. Dyslexia: Its impact on the Individual, Parents and Society. Sultan Qaboos University medical journal. 2012;12(3):269–72. pmid:23269947
- View Article
- PubMed/NCBI
- Google Scholar
5. Vellutino FR, Fletcher JM, Snowling MJ, Scanlon DM. Specific reading disability (dyslexia): What have we learned in the past four decades? Journal of Child Psychology and Psychiatry. 2004;45(1):2–40.
- View Article
- Google Scholar
6. Brunswick N. Unimpaired reading development and dyslexia across different languages. In: McDougall S, de Mornay Davies P, editors. Reading and dyslexia in different orthographies. Hove: Psychology Press; 2010. p. 131–154.
7. Cuetos F, Rodríguez B, Ruano E, Arribas D. PROLEC-R. Batería de evaluación de los procesos lectores, revisada. Madrid: TEA; 2014.
8. Toro J, Cervera M. TALE: Test de Análisis de Lectoescritura (TALE: Literacy Analysis Test). Madrid: Visor; 1984.
9. Fawcett AJ, Nicolson RL. Test para la detección de la dislexia en niños (DST-J). Madrid: TEA; 2011.
10. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine. 2001;23(1):89–109.
- View Article
- Google Scholar
11. Rello L, Ballesteros M. Detecting Readers with Dyslexia Using Machine Learning with Eye Tracking Measures. In: Proc. W4A’15. Florence, Italy; 2015. p. 121–128.
12. Benfatto MN, Seimyr GÖ, Ygge J, Pansell T, Rydberg A, Jacobson C. Screening for Dyslexia Using Eye Tracking during Reading. PloS one. 2016;11(12):e0165508. pmid:27936148
- View Article
- PubMed/NCBI
- Google Scholar
13. Rello L, Baeza-Yates R, Llisterri J. A Resource of Errors Written in Spanish by People with Dyslexia and its Linguistic, Phonetic and Visual Analysis. Language Resources and Evaluation. 2016;.
- View Article
- Google Scholar
14. Afonso O, Suárez-Coalla P, Cuetos F. Spelling impairments in Spanish dyslexic adults. Frontiers in psychology. 2015;6:466.
- View Article
- Google Scholar
15. Suárez-Coalla P, Ramos S, Álvarez-Cañizo M, Cuetos F. Orthographic learning in dyslexic Spanish children. Annals of dyslexia. 2014;64(2):166–181. pmid:25056668
- View Article
- PubMed/NCBI
- Google Scholar
16. Suárez-Coalla P, Cuetos F. Reading strategies in Spanish developmental dyslexics. Annals of dyslexia. 2012;62(2):71–81.
- View Article
- Google Scholar
17. Davies R, Rodríguez-Ferreiro J, Suárez P, Cuetos F. Lexical and sub-lexical effects on accuracy, reaction time and response duration: impaired and typical word and pseudoword reading in a transparent orthography. Reading and Writing. 2013;26(5):721–738.
- View Article
- Google Scholar
18. Suárez-Coalla P, Cuetos F. Reading difficulties in Spanish adults with dyslexia. Annals of dyslexia. 2015;65(1):33–51.
- View Article
- Google Scholar
19. Breiman L. Random Forests. Machine Learning. 2001;45:5–32.
- View Article
- Google Scholar
20. Cutler A, Cutler DR, Stevens JR. Random Forests. In: Zhang C, Ma Y, editors. Ensemble Machine Learning: Methods and Applications. Springer; 2012. p. 157–175.
21. International Dyslexia Association. Frequently Asked Questions; 2019.
- View Article
- Google Scholar
22. Snowling MJ. Early identification and interventions for dyslexia: a contemporary view. Journal of Research in Special Educational Needs. 2013;13(1):7–14. pmid:26290655
- View Article
- PubMed/NCBI
- Google Scholar
23. Gross-Tsur V, Manor O, Shalev RS. Developmental dyscalculia: Prevalence and demographic features. Developmental Medicine & Child Neurology. 1996;38(1):25–33.
- View Article
- Google Scholar
24. Pauc R. Comorbidity of dyslexia, dyspraxia, attention deficit disorder (ADD), attention deficit hyperactive disorder (ADHD), obsessive compulsive disorder (OCD) and Tourette’s syndrome in children: A prospective epidemiological study. Clinical chiropractic. 2005;8(4):189–198.
- View Article
- Google Scholar
25. Wilson AJ, Andrewes SG, Struthers H, Rowe VM, Bogdanovic R, Waldie KE. Dyscalculia and dyslexia in adults: cognitive bases of comorbidity. Learning and Individual Differences. 2015;37:118–132.
- View Article
- Google Scholar
26. Gilger JW, Pennington BF, DeFries JC. A twin study of the etiology of comorbidity: attention-deficit hyperactivity disorder and dyslexia. Journal of the American Academy of Child & Adolescent Psychiatry. 1992;31(2):343–348.
- View Article
- Google Scholar
27. Tridas EQ. From ABC to ADHD: What parents should know about dyslexia and attention problems. International Dyslexia Association; 2007.
- View Article
- Google Scholar
28. Rello L, Ballesteros M, Ali A, Serra M, Alarcón D, Bigham JP. Dytective: Diagnosing Risk of Dyslexia with a Game. In: Proc. Pervasive Health’16. Cancun, Mexico; 2016.

[ref1] 1. Lyon GR, Shaywitz SE, Shaywitz BA. A definition of dyslexia. Annals of Dyslexia. 2003;53(1):1–14.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Krafnick AJ, Flowers DL, Napoliello EM, Eden GF. Gray matter volume changes following reading intervention in dyslexic children. Neuroimage. 2011;57(3):733–741. pmid:21029785
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Gabrieli JD. Dyslexia: a new synergy between education and cognitive neuroscience. Science. 2009;325(5938):280–283. pmid:19608907
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Al-Lamki L. Dyslexia: Its impact on the Individual, Parents and Society. Sultan Qaboos University medical journal. 2012;12(3):269–72. pmid:23269947
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref5] 5. Vellutino FR, Fletcher JM, Snowling MJ, Scanlon DM. Specific reading disability (dyslexia): What have we learned in the past four decades? Journal of Child Psychology and Psychiatry. 2004;45(1):2–40.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref6] 6. Brunswick N. Unimpaired reading development and dyslexia across different languages. In: McDougall S, de Mornay Davies P, editors. Reading and dyslexia in different orthographies. Hove: Psychology Press; 2010. p. 131–154.

[ref7] 7. Cuetos F, Rodríguez B, Ruano E, Arribas D. PROLEC-R. Batería de evaluación de los procesos lectores, revisada. Madrid: TEA; 2014.

[ref8] 8. Toro J, Cervera M. TALE: Test de Análisis de Lectoescritura (TALE: Literacy Analysis Test). Madrid: Visor; 1984.

[ref9] 9. Fawcett AJ, Nicolson RL. Test para la detección de la dislexia en niños (DST-J). Madrid: TEA; 2011.

[ref10] 10. Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine. 2001;23(1):89–109.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref11] 11. Rello L, Ballesteros M. Detecting Readers with Dyslexia Using Machine Learning with Eye Tracking Measures. In: Proc. W4A’15. Florence, Italy; 2015. p. 121–128.

[ref12] 12. Benfatto MN, Seimyr GÖ, Ygge J, Pansell T, Rydberg A, Jacobson C. Screening for Dyslexia Using Eye Tracking during Reading. PloS one. 2016;11(12):e0165508. pmid:27936148
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref13] 13. Rello L, Baeza-Yates R, Llisterri J. A Resource of Errors Written in Spanish by People with Dyslexia and its Linguistic, Phonetic and Visual Analysis. Language Resources and Evaluation. 2016;.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Afonso O, Suárez-Coalla P, Cuetos F. Spelling impairments in Spanish dyslexic adults. Frontiers in psychology. 2015;6:466.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Suárez-Coalla P, Ramos S, Álvarez-Cañizo M, Cuetos F. Orthographic learning in dyslexic Spanish children. Annals of dyslexia. 2014;64(2):166–181. pmid:25056668
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref16] 16. Suárez-Coalla P, Cuetos F. Reading strategies in Spanish developmental dyslexics. Annals of dyslexia. 2012;62(2):71–81.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref17] 17. Davies R, Rodríguez-Ferreiro J, Suárez P, Cuetos F. Lexical and sub-lexical effects on accuracy, reaction time and response duration: impaired and typical word and pseudoword reading in a transparent orthography. Reading and Writing. 2013;26(5):721–738.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref18] 18. Suárez-Coalla P, Cuetos F. Reading difficulties in Spanish adults with dyslexia. Annals of dyslexia. 2015;65(1):33–51.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref19] 19. Breiman L. Random Forests. Machine Learning. 2001;45:5–32.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref20] 20. Cutler A, Cutler DR, Stevens JR. Random Forests. In: Zhang C, Ma Y, editors. Ensemble Machine Learning: Methods and Applications. Springer; 2012. p. 157–175.

[ref21] 21. International Dyslexia Association. Frequently Asked Questions; 2019.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref22] 22. Snowling MJ. Early identification and interventions for dyslexia: a contemporary view. Journal of Research in Special Educational Needs. 2013;13(1):7–14. pmid:26290655
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref23] 23. Gross-Tsur V, Manor O, Shalev RS. Developmental dyscalculia: Prevalence and demographic features. Developmental Medicine & Child Neurology. 1996;38(1):25–33.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref24] 24. Pauc R. Comorbidity of dyslexia, dyspraxia, attention deficit disorder (ADD), attention deficit hyperactive disorder (ADHD), obsessive compulsive disorder (OCD) and Tourette’s syndrome in children: A prospective epidemiological study. Clinical chiropractic. 2005;8(4):189–198.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref25] 25. Wilson AJ, Andrewes SG, Struthers H, Rowe VM, Bogdanovic R, Waldie KE. Dyscalculia and dyslexia in adults: cognitive bases of comorbidity. Learning and Individual Differences. 2015;37:118–132.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref26] 26. Gilger JW, Pennington BF, DeFries JC. A twin study of the etiology of comorbidity: attention-deficit hyperactivity disorder and dyslexia. Journal of the American Academy of Child & Adolescent Psychiatry. 1992;31(2):343–348.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref27] 27. Tridas EQ. From ABC to ADHD: What parents should know about dyslexia and attention problems. International Dyslexia Association; 2007.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref28] 28. Rello L, Ballesteros M, Ali A, Serra M, Alarcón D, Bigham JP. Dytective: Diagnosing Risk of Dyslexia with a Game. In: Proc. Pervasive Health’16. Cancun, Mexico; 2016.

Figures

Abstract

Introduction

Materials and method

Method

Content design

Participants

Dependent measures

Compliance and ethics statements

Procedure

Data sets

Results

Predictive model

Deploying the model

Optimization

Discussion

Feature analysis

Customized test with new participants

Limitations

Conclusions

Acknowledgments

References