Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Learning to decipher time-compressed speech: Robust acquisition with a slight difficulty in generalization among young adults with developmental dyslexia

  • Yafit Gabay ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    ygabay@edu.haifa.ac.il

    Affiliations Department of Special Education, University of Haifa, Haifa, Israel, Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, Department of Learning Disabilities, University of Haifa, Haifa, Israel

  • Avi Karni,

    Roles Conceptualization, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, Department of Learning Disabilities, University of Haifa, Haifa, Israel, Sagol Department of Neurobiology, University of Haifa, Haifa, Israel

  • Karen Banai

    Roles Conceptualization, Methodology, Software, Supervision, Writing – original draft

    Affiliation Department of Communications Sciences and Disorders, University of Haifa, Haifa, Israel

Abstract

Learning to decipher acoustically distorted speech serves as a test case for the study of language-related skill acquisition in persons with developmental dyslexia (DD). Deciphering this type of input is rarely learned explicitly and does not yield conscious insights. Problems in implicit and procedural skill learning have been proposed as possible causes of DD. Here we examined the learning of time-compressed (accelerated) speech and its generalization to novel materials among young adults with DD compared to typical readers (TD). All participants completed a training session that involved judging the semantic plausibility of sentences, during which the level of time-compression was changed using an adaptive (staircase) procedure according to each participant’s performance. In the test, phase learning (test on same items) and generalization (test on new items and same items spoken by a new speaker) were assessed. Both groups showed robust gains after training. Moreover, after training, the initial disadvantage of the DD group was no longer significant. After training, both groups experienced relative difficulties in deciphering learned tokens spoken by a different voice, though participants with DD were less able to generalize the gains to deciphering new tokens. Thus, DD individuals benefited from repeated experience with time-compressed speech no less than typical readers, but their evolving skill was apparently more dependent on the specific characteristics of the tokens. Atypical generalization, which indicates that perceptual learning is contingent on lower-level features of the input though does not necessarily point to impaired learning potential per se, may explain some of the contradictory findings in published studies of speech perception in DD.

Introduction

Although usually transparent to listeners, speech perception is quite a challenging task. In particular, it requires mapping the acoustic input onto stable (pre-lexical/lexical) representations even though the speech signal itself is variable as a result of between-speaker differences, changes in speech rate [1] and environmental conditions [2]. Speech stimuli constitute a learning challenge for the perceptual system because accurate speech recognition requires generalization across the highly variable acoustic information that underlies the speech signal. Listeners are capable of overcoming these variations in speech through perceptual learning, according to which they align their perceptual system with new variations in the speech input [3]. Perceptual learning has been demonstrated across a variety of tasks in which the speech signal is noisy, distorted (e.g., noise vocoded, spectrally shifted or time-compressed speech) or otherwise unusual (e.g., unfamiliar dialects or accents) [4]. Previous studies suggest that adaptive training procedures that start off with relatively little signal distortion (“easy” items, not far removed from standard speech) may be advantageous for learning and its generalization [5, 6].

Implicit and procedural learning in speech perception

Since speech is rarely learned explicitly and perceptual learning does not yield conscious insights that can be easily communicated, the perceptual learning of speech is a case of implicit learning of skills that are essential to human communication [7]. Implicit learning refers to situations in which learning occurs incidentally [8], and the knowledge gained through this process is believed to be implicit as participants find it difficult to conceptualize what has been learned [9]. Implicit and procedural learning has been related to the acquisition and formation of motor skills [10]. An accumulating body of evidence also implicates its involvement in language-related skills, including the acquisition of grammar, syntax, morphology and phonology [1114]. Research closely related to the present study also implicates implicit learning in the perceptual learning of speech [15].

Implicit and procedural learning in developmental dyslexia

Developmental Dyslexia (DD) is one of the most common neurodevelopmental disorders, with prevalence rates estimated at 5%-10% [16]. Despite extensive research, the underlying biological and cognitive causes of DD remain unclear. DD has been thought to arise from phonological impairments [17]. Recent conceptualizations of dyslexia implicate domain-general procedural and/or implicit learning systems in its etiology [13, 1822]. These views are based on increasing evidence for the role of non-declarative systems in language learning and development [1115] and on the plethora of findings that individuals with dyslexia often demonstrate impairments on procedural and implicit learning tasks [21, 2331]. Although there is evidence suggesting intact procedural learning in DD [3234], a recent meta-analysis argues in favor of the possibility that compensatory declarative learning mechanisms may mask procedural learning deficits in DD [27].

Although perceptual learning has been examined previously in individuals with DD in both the visual [35, 36] and the auditory modalities [3742], perceptual learning of speech stimuli has rarely been assessed. Compared with other stimuli, speech stimuli represent a different learning challenge for the perceptual system. Generalization, for example, may be particularly important for speech perception due to the highly variable nature of the acoustic information that underlies the speech signal. The goal of the current study was therefore to investigate the perceptual learning of distorted speech in people with DD. Such speech is often particularly challenging for people with dyslexia [43].

Many studies suggest that typically developing individuals can adapt to such speech quite rapidly, especially under favorable learning conditions [44]. Recent studies suggest that adaptive protocols that begin with easy tasks provide such conditions [5, 6]. Previous observations suggest that adaptive training conditions yield more perceptual learning and generalization than constant training conditions [5]. Thus, to provide a strict test of the hypothesis that learning may differ between DD and TD participants, we used an adaptive protocol in the current study. Three indices of learning were investigated. First we asked whether rapid baseline adaptation to time-compressed speech is affected by DD. Second, we compared the effects of adaptive training on the recognition of time-compressed speech between the two groups of readers. Third, we compared the ability to transfer the training-related gains to novel conditions, i.e., conditions not encountered in training, across the two groups of readers. Two types of transfer were studied: (1) transfer to stimuli that share the high-level features of the trained tokens, but differ in their low-level features, i.e., sentences identical to those presented during training but produced by a new unfamiliar speaker; (2) transfer to stimuli that share the low-level features of the trained stimuli but differ in their high-level features, i.e., novel sentences but uttered by the speaker encountered in the training phase.

Methods

Participants

Participants were 24 university students (undergraduates or graduate students), among them 12 dyslexics (5 female) and 12 typical readers (7 female). A similar sample size was sufficient to detect group differences on the same task between native and non-native listeners [45]. Participants were native Hebrew speakers with no history of neurological disorders, psychiatric disorders or attention deficits. In addition, participants were right handed, had normal or corrected-to-normal vision, and normal hearing (participants in the DD group were screened for normal hearing; participants in the control group declared they had no hearing impairment). The DD group was recruited from the Student Support Service at the University of Haifa, a center that provides support for students with learning disabilities. Dyslexia was diagnosed by the University of Haifa Learning Disabilities Diagnostic Center by means of the MATAL test. This test is designed to assess developmental disabilities (Dyslexia, Dysgraphia, Dyscalculia, and Attention Deficit Disorder) in adults who are native Hebrew speakers. The MATAL is a standardized test developed by the Israeli National Institute for Testing and the Israeli Council for Higher Education [46]. The test consists of 20 tests and 54 performance measures, and was validated and normed with a standardization sample of 508 participants. The MATAL has been used in many previous investigations for the assessment of dyslexia [47, 48]. The typical reading group (TD) consisted of participants with no history of learning disabilities. Both the DD and the TD groups performed a battery of cognitive and literacy tests similar to the battery used in the study by]. The ethics committee of the Faculty of Social Welfare and Health Sciences at the University of Haifa (199/12) approved all aspects of the study and written informed consent was obtained from all participants.

Cognitive and literacy measures

Intellectual ability.

Intelligence was assessed by means of two subtests from the Wechsler Intelligence test for adults [49]. One is the non-verbal block design task in which participants are required to rearrange blocks with different color patterns according to a stimulus presented to them upon a card. The other is the verbal similarities subtest in which participants are required to indicate what two words in a pair have in common (i.e., what do dog and cat have in common = both are animals).

Verbal working memory.

Verbal working memory was assessed by the Digit Span subtest from the Wechsler Adult Intelligence Scale [49]. In this test the examiner reads a list of digits to the examinee and the examinee is required to repeat the digits in that order (forward) or to state the digits in reverse order (backward). Task administration is stopped after failure to recall on two trials with a similar number of digits.

Reading skills.

Decoding, reading fluency, and reading comprehension tests were administered, as described in the following sections.

Two tests were used to assess decoding skills: One Minute Tests of Words [50] and of Non-words [51], which examine the number of words and non-words accurately read aloud within a time limitation of one minute. The first test included 168 non-vowelized words of an equal level of difficulty listed in columns. The second test was composed of 86 successively difficult vowelized non-words listed in columns. In both tests, measures of accuracy (number of correct words read per minute) and of speed (number of items read per minute) were collected.

The Oral Reading Tests obtained from the reading comprehension subset of the Israeli Psychometric Exam was used to assess reading fluency. In this test, participants were required to read a text of 216 words aloud, as quickly and accurately as possible. The number of words read correctly per minute was calculated.

Reading-related skills.

Phonological awareness was assessed by the following tests: Phoneme Deletion, Segmentation and Parsing [52]. The phoneme deletion test consists of 25 non-words. In this test, the experimenter reads a word and a phoneme aloud and the participant is required to indicate how the word sounds after deletion of this phoneme. The segmentation test includes 16 non-words that are read to the examinee by the experimenter. The task is to segment the word into its basic phonological sounds as quickly as possible. The parsing test [53] contains 46 rows of words. Each row is composed of four words printed with no spaces between them. The participants’ task was to identify the words in each row by drawing a line to mark where the spaces should be. For all tests, both accuracy (number of correct letters/objects read per minute) and time (the time participants required to complete the task) were measured.

Naming skills were assessed through the RAN- Naming Speed Test [54] that consists of the following tests for naming objects and letters and for naming alternating objects and letters. In the letter naming test (RAN letters), five (non-final) Hebrew letters—ס, א, ד, ג, ל—were repeatedly presented in random order, with each letter repeated ten times. The participants were asked to read the 50 letters aloud as quickly and accurately as they could. The object naming test (RAN object) consists of pictures of five objects: flower, cat, book, watch and flag, where each object is repeated randomly 10 times. The participants were asked to name the 50 pictures aloud as accurately and quickly as they could. In both tasks, the accuracy rates and the time for naming the entire list were measured.

TD and DD listeners did not differ in intelligence (as measured by the block design subtest and by verbal ability scores measured by the similarities subtest) or chronological age. However, there were significant group differences with regard to reading, naming and phonological skills (see Table 1), confirming group assignments with respect to reading ability.

thumbnail
Table 1. Performance of the DD and TD groups on cognitive and literacy measures.

https://doi.org/10.1371/journal.pone.0205110.t001

Experimental procedure

Stimuli.

The stimuli and the procedure were similar to those used in our previous study [5]. A young male native speaker of Hebrew (the trained speaker) recorded and sampled the stimuli at 44 kHz using a standard microphone and PC soundcard and Audacity software. Additionally, several sentences designed to assess generalization to a new speaker were recorded by a second native Hebrew speaker. RMS levels of all sentences were normalized after recording and before compression. Stimuli were time-compressed using a WSOLA algorithm [55], which changes speech rate but preserves other qualities such as pitch and timbre.

The sentences included 120 simple active subject-verb-object (SVO) sentences in Hebrew taken from the study by Prior and Bentin [56]. Each sentence contained 5–6 words and had adjectives modifying both the subject and the object. The duration of the naturally spoken sentences ranged from 2.3–4.2 s (72–144 words/minute). This speech rate is similar to that of Israeli newscasters [57]. Sixty sentences were semantically plausible (true, e.g., “The municipal museum purchased the impressionist painting”), whereas the remaining sentences (false) contained a semantic violation that rendered them improbable (e.g., “The municipal museum ate the impressionist painting”). One hundred sentences (50 true) were used for training. Twenty of those sentences were presented in the pre-test and test phases to assess learning of the repeated tokens. Likewise, 20 of the trained sentences uttered by a different speaker were used to assess cross-speaker generalization. The remaining 20 sentences were used to assess generalization to untrained tokens.

Procedure.

Testing took place in a quiet room and participants were seated directly in front of a computer monitor during the entire experiment. Stimulus presentation and time compression manipulation were controlled by Matlab. Stimuli were presented binaurally using headphones (Sennheiser HD-215). The experiment consisted of three phases: a pre-test phase in which baseline performance was assessed, a training phase and a test phase. During the pre-test and test phases, participants were required to write down each of the presented sentences as accurately as they could. During the training phase, participants were required to press a key to indicate whether the sentences they heard were plausible or not.

The experiment was administered in one session of approximately one hour. Cognitive and literacy tests were administered to participants in a different session. During the session, participants completed the pre-test, the training and the test. The training phase consisted of 100 different time-compressed sentences. During training, listeners performed a semantic verification task on these sentences during five blocks, each containing 60 trials. After hearing each sentence, listeners were required to determine whether it was semantically improbable (false) or probable (true). Sentences were selected at random (without replacement) until all 100 sentences were presented, after which random selection began again. Visual feedback (smiling/sad face) was delivered to participants after each response. In the present study, an adaptive staircase training protocol was used. That is, training started with a compression level of 65% of the naturally spoken duration. After that, compression was adapted using a 2-down/1-up staircase procedure in 25 logarithmically equal steps to a maximal compression of 20% [58]. The considerations that led us to select the stimuli (compression rates) were very similar to considerations used in many perceptual learning studies (e.g., [59]). The idea was to start from typical levels of performance and try to push the participants’ performance as much as possible into conditions wherein untrained individuals would fail to correctly recognize the stimuli. Thus, the compression rates were chosen so as to provide experience with speech rates that range from easily recognizable up to high-speed speech stimuli that cannot be recognized by native listeners without specific training [60, 61].

Test and training tasks.

During the pretest phase, 20 sentences compressed to 30% of their naturally spoken duration were presented. During the test phase, blocks of 20 sentences compressed to 30% of their naturally spoken duration were presented. The participants’ task was to write the sentences down as accurately as they could. The test phase consisted of three different conditions of 20 trials each (repeated items, new items, repeated items presented by a different speaker). 1) In the repeated-items test, 20 sentences randomly selected from the training set were uttered by the same male speaker from the training phase. 2) In the new-items test, 20 new sentences with similar semantic structure to those in the training phrase were uttered by the same speaker heard throughout the training phase. 3) In the test of repeated items presented by a different speaker, 20 sentences were selected from the training set but uttered by a different male speaker. The order of the three tests was counterbalanced across participants. No feedback was provided during either the pre-test or the test. See Fig 1 for an illustration of the design.

thumbnail
Fig 1. Procedure.

Participants performed pre-test and five blocks of training (each contained sixty trials). After that, participants performed a test with three conditions. Test performance on same tokens is indicative of learning. Test performances with the new speaker and with new tokens are indicative of generalization.

https://doi.org/10.1371/journal.pone.0205110.g001

Results

Data analysis

Performance during the pre-test and test was quantified as the mean proportion of words correctly identified across all sentences in a given condition. Orthographic errors (e.g., homophones) were not calculated as errors because the purpose was to assess whether listeners heard the sentences correctly and not to assess their writing skills. Incomplete/incorrect suffixes were considered errors because Hebrew is an inflected language and suffixes affect the meaning of the sentence (e.g., changing the timing of an event from past to future). The mean proportion of sentences correctly judged (verification) in each block was used to quantify performance during the training phase. To this end, mean verification threshold were calculated based on the five final trials in each block.

We first calculated participants’ level of performance during the pretest phase. Previous research has contended that for typical readers rapid learning can be observed even during the pretest phase [60]. We then estimated training-phase performance in the two groups by calculating the 71% correct verification thresholds for each listener (for details see [45]. Group differences after training were then evaluated. For this purpose, test performance was compared to pre-test performance on the repeated-tokens as evidence for learning across groups. Finally, test performance on the trained items was compared to performance on new items and on items produced by a new speaker to test for generalization.

Rapid learning during the pre-test

Fig 2 shows the mean performance accuracy over the first and last five trials of the pretest phase. An analysis of variance was conducted, with group (TD vs. DD) as a between-subject factor, learning (first five trials vs. five last trials) as within-subject factors and mean proportion of words correctly identified as the dependent variable. The main effect of group was significant, suggesting that DD participants were less able to decipher time-compressed speech compared to TD participants (F(1, 22) = 8.68, p<.01; ηp2 = .28). However, the main effect of learning was also significant, suggesting that recognition accuracy improved during the test (rapid learning, F(1, 22) = 122.58, p<.01; ηp2 = .28). There was no significant interaction of group by learning (F<1), suggesting that both groups improved to a similar extent during this phase.

thumbnail
Fig 2. Pretest performance (mean of five first trials vs. mean of five last trials) as a function of group (TD vs. TD).

Error bars show the 95% confidence interval of the mean.

https://doi.org/10.1371/journal.pone.0205110.g002

Learning during the training phase

Fig 3 depicts the performance of both the DD and the TD groups over the course of training. An ANOVA was conducted, with group (TD vs. DD) as a between-subject factor and the mean verification thresholds in each block of training (1–5) as within-subjects factors. The main effect of group was significant (F (1, 22) = 8.7, p<.01; ηp2 = .27), indicating that TD listeners were generally able to correctly judge sentences that were more time-compressed compared to DD listeners. Nevertheless, the difference between the two groups was quite small and as can be seen in Fig 3, a single training block sufficed to bring the level of performance in the DD group up to that of the initial performance of the TD participants. The main effect of block was significant as well, suggesting that both groups improved with practice (F (1, 22) = 122.6, p<.01; ηp2 = .84). The interaction of group by learning was not significant (F<1), suggesting that despite the overall performance differences, the amount of learning was similar in the two groups.

thumbnail
Fig 3. Training-phase performance as a function of group (TD vs. DD).

Error bars show the 95% confidence interval of the mean.

https://doi.org/10.1371/journal.pone.0205110.g003

Training-induced learning

Fig 4 shows the performance of the two groups in the repeated-tokens condition on the pre-test and test. An analysis of variance (ANOVA) on mean proportion of words correctly identified with group (DD vs. TD) as a between-subjects factor and test (pre vs. post) as a within-subjects factor yielded a main effect of group (F (1, 22) = 8.3, p<.01; ηp2 = .27). There was a main effect of test, indicating an increase in performance accuracy by the end of training (F (1, 22) = 268.21, p<.01; ηp2 = .92). The interaction of group by phase was also significant (F (1, 22) = 6.63, p<.05; ηp2 = .23), reflecting the larger improvements in the DD group. TD listeners significantly outperformed DD listeners during the pre-test (F (1, 22) = 8.63, p<.01), but the test showed only a trend toward a between-group difference (F (1, 22) = 3.67, p = .065). Thus, training with time-compressed speech was helpful in reducing the performance differences between TD and DD readers.

thumbnail
Fig 4. Pretest vs. test-phase performance on the repeated tokens conditions.

Error bars show the 95% confidence interval of the mean.

https://doi.org/10.1371/journal.pone.0205110.g004

Training-induced generalization

Fig 5 shows the performance of the two groups in the trained-token condition and in the two transfer conditions. The participants’ ability to generalize the gains they acquired during training to writing (reproducing) the time-compressed sentences was compared to the participants’ ability to reproduce novel time-compressed sentences (new tokens) and separately compared to their ability to reproduce the trained tokens recorded by a different speaker (new speaker).

thumbnail
Fig 5. Test-phase performance on the trained tokens, new tokens and new speaker conditions as a function of group (TD vs. DD).

Error bars show the 95% confidence interval of the mean.

https://doi.org/10.1371/journal.pone.0205110.g005

Generalization to new tokens.

An analysis of variance (ANOVA) was conducted on the mean proportion of words correctly identified, with group (DD vs. TD) as a between-subjects factor and token type (trained sentences vs. new sentences) as a within-subject factor. Group exhibited a significant main effect, reflecting the lower recognition accuracy in the DD group, (F (1, 22) = 9.21, p<.01; ηp2 = .29), while token type exhibited only a marginal effect (F (1, 22) = 3.79, p = .064; ηp2 = .14), indicating that in general participants were marginally more accurate when tested with repeated tokens compared to new tokens. However, the group-by-token type interaction was significant (F (1, 22) = 6.88, p<.05; ηp2 = .23; ηp2 = .23). Further analysis revealed that whereas TD listeners recognized repeated and newly encountered items similarly (F<1), participants with DD were significantly less accurate with newly encountered than with repeated tokens (F (1, 22) = 10.44, p<.01). Together these findings suggest that listeners with DD were less able to transfer their learning-related gains to tokens they had not encountered before.

Generalization to a new speaker.

An analysis of variance (ANOVA) on the mean proportion of words correctly identified was conducted, with group (DD vs. TD) as a between-subjects factor and token type (trained speaker vs. new speaker) as a within-subjects factor. There was a significant main effect of group, reflecting the less accurate performance of the DD group (F (1, 22) = 6.03, p<.05; ηp2 = 21). The main effect of token type (speaker voice) was also significant (F (1, 22) = 51.71, p<.01; ηp2 = 701), indicating that in general participants performed better during the test with the trained tokens compared to the same tokens but uttered by a new speaker. The group-by-token type interaction was not significant (F (1, 22) = 2.85, p = .10; ηp2 = .11).

Relationship between task performance and individual reading ability

In addition to the group analyses reported above, we explored the relationships between test-phase performance in the three different conditions (learning and generalization to new tokens/new speaker) and reading ability as measured by four standardized tests. As shown in Table 2, generalization to new tokens was positively correlated with the estimates of reading ability. Furthermore, a negative correlation was observed between the generalization scores and the time required to name objects and digits. Lastly, a negative correlation was observed between generalization scores (in both the new token and the new speaker conditions) and time required for parsing printed words with no space between them. No correlations were observed between the generalization scores (in both the new token and the new speaker conditions) and intellectual abilities or working memory (digit span). Given that these correlations are consistent with the findings of the primary group analyses presented above (not surprisingly, as the groups were defined based on literacy abilities) and given the relatively small sample, we refrain from any further discussion and interpretation of these findings.

thumbnail
Table 2. Relationship between task performance and individual reading ability.

https://doi.org/10.1371/journal.pone.0205110.t002

Discussion

Impairments in implicit skill acquisition have been proposed to have a deleterious impact in DD [13, 1821, 24, 62]. Perceptual learning of speech represents a case of procedural learning (i.e., skill learning—how to, what to do knowledge—that are acquired implicitly and are difficult to verbalize explicitly, [7]. The current results show that despite the initial advantage of typical readers over struggling readers in the ability to decipher time-compressed speech, both groups improved with practice, such that the magnitude of learning was similar in the two groups. Thus, given an identical training experience in deciphering time-compressed speech, young adults with DD were as adept in acquiring the specific skill as their typical reading peers. Moreover, compared to their pre-training baseline performance, both groups improved in deciphering tokens uttered in a new (untrained) speaker’s voice and much improved in their ability to decipher new time-compressed tokens. Nevertheless, listeners with DD were less able to transfer their learning-related gains to tokens that were not encountered during the training session. Both groups were hampered in deciphering the trained tokens delivered by a new speaker compared to their ability to decipher tokens presented in the familiar (trained) voice.

Baseline recognition of time-compressed speech was less accurate among DD participants than among TD participants. This relative disadvantage is consistent with previous findings reporting that the processing of time-compressed speech is deficient among impaired readers [63, 64]. Yet although on average the performance of DD readers during the pretest phase was below that of TD readers, a single session of adaptive training with time-compressed speech resulted in a reduction of group differences. Moreover, the DD group gained as much and even more than their peers from this practice. These results are consistent with previous studies indicating that given appropriate training conditions, individuals with DD can reveal their extant, intact, perceptual learning ability in both visual [35, 36] and auditory [37, 3942, 65] domains. Thus, no less than their peers people with DD retain the potential to benefit from practice, i.e., from repeated experience with stimuli that defy explicit awareness of what has been gained.

Yet despite relatively intact learning, the ability to generalize the gains attained in training to new tokens and a new speaker was relatively less robust in the DD group. In particular, after practice, unimpaired readers were as adept in recognizing new speech items as they were in recognizing tokens they had encountered a few times during training. This generalization was less effective in people with DD. Both groups were less accurate in deciphering the trained tokens when these were presented by a new speaker, but on average the typical readers performed better on this test. Thus, there were limits on generalization not only in the DD group but also in the TD group.

One should note that in many laboratory paradigms that address skill acquisition, such as category learning [66] and specifically artificial grammar learning [67], participants are examined on novel (i.e., generalization conditions) items with structural elements similar to those governing the trained set of items or random structures. A similar approach is assumed in some sequence learning paradigms, specifically the serial reaction time task (e.g. the SRT task, [8]) wherein the major condition for learning is the practice-dependent difference that evolves between performance of the repeated sequence versus performance of a random or novel sequence [68]. These test conditions can be viewed as tests of the ability to generalize to a new (yet nevertheless specific) condition, rather than of skill acquisition per se (proficiency, how to knowledge). Thus, some of the inconsistencies in results concerning skill acquisition in DD may relate to the ambiguity of whether the condition used to assess ability and skill acquisition is a transfer test condition or whether it directly reflects intrinsic gains in the performance of the trained condition (for example, performance improvements relative to the initial, pre-training, level). This distinction is quite standard in studies of perceptual learning [59, 61].

The current results suggest that after practice individuals with DD 1) improve no less and perhaps even more than their peers in the acquisition of skills related to the trained (specific, repeated) items; 2) may have equal difficulty (and thus do not differ from typical readers) in generalizing to some new conditions (deciphering trained items presented in a new speaker’s voice); but 3) have specific difficulties in different generalization conditions (deciphering new tokens uttered by the voice heard in training). Thus, the gains achieved by TD and DD groups under identical training conditions may be of similar magnitude relative to the specific training condition and items, yet the two groups may differ in terms of their respective ability to transfer these gains to (some) untrained conditions and items.

Using the logic underlying the analyses of transfer limitations in non-language-related perceptual learning paradigms (e.g, [7, 69]) the differential difficulty in deciphering novel speech items uttered by the trained voice versus the generalizing of performance gains to a new speaker—seen even in typical readers—can be considered as reflecting the processes and perhaps the level of stimulus representation affected by the training experience. One issue is the involvement of declarative memory processes in learning to decipher time-compressed speech. Participants may have formed memory representations for specific items (although the number of token repetitions was quite small, some target sentences were repeated up to four times in the training and pre-test list) that supported their recognition under adverse listening conditions [70]. In the new token condition, however, such memory processes are unlikely to explain the resulting deciphering skill as participants encountered the sentences for the first time (as in the pretest phase). Nevertheless, post-training performance on these novel sentences was superior to naïve performance with time-compressed speech. On the other hand, both groups—TD and DD—had relative difficulty in deciphering the trained tokens when presented by a new speaker’s voice after the training session. This difficulty suggests a practice-dependent reliance on a representation wherein the fundamental frequency of the speaker’s voice is differentially represented. The feature dependency of skills (i.e., specificity of the acquired skill for physical attributes of the trained stimuli) is well recognized in perceptual learning and has been explored in multiple domains [7, 59, 71].

Experiments with speaker variability, speech rate and perceptual learning provide strong evidence for implicit memory for very fine perceptual details of speech (e.g., [71]). Listeners apparently encode specific attributes of the speaker’s voice and speaking rate into long-term memory [72]. Pisoni (67) has suggested that information that is not typically considered to be stored as part of the phonetic or lexical representation of words is nevertheless retained in long-term memory (for example, information about the speech rate or the speaker’s dialect). By this account, specific “episodes” (i.e., instances in which the stimulus was encountered in the input) as well as the operations used for perceptual analysis are encoded and form the foundation for feature-specific (e.g., speaker- or voice-specific) perceptual operations that are part of the procedural memory emerging from the episode (e.g., [73]). Thus, in line with Pisoni’s suggestion (67), the procedures or perceptual operations used to recognize specific, and importantly repeated, speech may generate procedural knowledge, i.e., a processing routine set and honed to accommodate novel and more demanding listening conditions (that have been repeatedly encountered), so that the perceptual analysis for novel words produced by familiar speakers can be carried out efficiently, without the repeated need for detailed analysis of the speaker’s voice [1]; For a similar notion in the visual and motor domain, see [74, 75]. Hence, repeated perceptual episodes (exemplars) with a given speaker’s voice tend to be stored in memory in a feature-specific manner rather than as a general routine [73, 7678].

The reduced ability of people with DD to transfer their robust practice-related gains when new tokens were presented in the trained voice can indicate that the setting of a processing routine is more heavily weighted for specific items repeatedly encountered in training [78, 79]. The data are also compatible with the notion that people with DD establish processing routines that are more heavily weighted for low-level (feature-specific) representations in deciphering time-compressed speech (as suggested by [71]) This notion is in line with the finding that those in the DD group had more difficulty than their typical reading peers in deciphering familiar tokens presented in the voice of a new (unfamiliar) speaker. Thus, a parsimonious conjecture would be that dyslexics may tend to over-rely on lower-level representations of distorted, unfamiliar speech input. That is, dyslexic readers may be more prone than their normal reading peers to generate feature-specific (and item-specific) routines when provided with repeated experience with challenging input or perhaps they rely more heavily on lower-level skills when faced with perceptually taxing conditions [80]. Thus, the current findings extend previous findings and underscore the notion that generalization problems can affect the ability of people with DD to acquire abstract knowledge from limited experience [29, 81]. Given this notion, perhaps different or modified training conditions are needed when designing learning opportunities for atypical populations to accommodate the different learning strategies of impaired readers. A similar notion has been suggested recently for other special populations as well [82, 83].

Speech perception problems have been documented in people with DD (Rosen, 2003). Most studies, however, assessed the end product of learning (discrimination between phonological categories) [43, 84] rather than the learning process itself [21]. The current results indicate that adaptive training may resolve some of the initial differences observed between TD and DD listeners when deciphering time-compressed speech. This suggests that DD individuals are capable of benefiting from repeated exposure. Yet even after the brief training experience, the ability to generalize what has been learned to new items was more difficult for those with DD. Impaired generalization may have consequences for the ability of those with DD to form "abstract" knowledge. This could explain the difficulties people with DD may have in adjusting to and generalizing some of the variability that characterizes phonological categories.

Supporting information

S1 File. Sample of sentences used in the study.

https://doi.org/10.1371/journal.pone.0205110.s001

(RAR)

Acknowledgments

This research was supported by a grant from the National Institute of Psychobiology in Israel to KB.

References

  1. 1. Obleser J, Eisner F. Pre-lexical abstraction of speech in the auditory cortex. Trends in cognitive sciences. 2009;13(1):14–9. pmid:19070534
  2. 2. Mattys SL, Davis MH, Bradlow AR, Scott SK. Speech recognition in adverse conditions: A review. Language and Cognitive Processes. 2012;27(7–8):953–78.
  3. 3. Samuel AG, Kraljic T. Perceptual learning for speech. Attention, Perception, & Psychophysics. 2009;71(6):1207–18.
  4. 4. Guediche S, Blumstein S, Fiez J, Holt LL. Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research. Frontiers in systems neuroscience. 2014;7:126. pmid:24427119
  5. 5. Gabay Y, Karni A, Banai K. The perceptual learning of time-compressed speech: A comparison of training protocols with different levels of difficulty. PloS one. 2017;12(5):e0176488. pmid:28545039
  6. 6. Svirsky MA, Talavage TM, Sinha S, Neuburger H, Azadpour M. Gradual adaptation to auditory frequency mismatch. Hearing research. 2015;322:163–70. pmid:25445816
  7. 7. Fahle M. Perceptual learning: specificity versus generalization. Current opinion in neurobiology. 2005;15(2):154–60. pmid:15831396
  8. 8. Nissen MJ, Bullemer P. Attentional requirements of learning: Evidence from performance measures. Cognitive psychology. 1987;19(1):1–32.
  9. 9. Ashby FG, Alfonso-Reese LA, Waldron EM. A neuropsychological theory of multiple systems in category learning. Psychological review. 1998;105(3):442. pmid:9697427
  10. 10. Doyon J, Penhune V, Ungerleider LG. Distinct contribution of the cortico-striatal and cortico-cerebellar systems to motor skill learning. Neuropsychologia. 2003;41(3):252–62. pmid:12457751
  11. 11. Conway CM, Bauernschmidt A, Huang SS, Pisoni DB. Implicit statistical learning in language processing: Word predictability is the key. Cognition. 2010;114(3):356–71. pmid:19922909
  12. 12. Conway CM, Pisoni DB. Neurocognitive basis of implicit learning of sequential structure and its relation to language processing. Annals of the New York Academy of Sciences. 2008;1145(1):113–31.
  13. 13. Ullman MT. Contributions of memory circuits to language: The declarative/procedural model. Cognition. 2004;92(1–2):231–70. pmid:15037131
  14. 14. Ullman MT. A neurocognitive perspective on language: The declarative/procedural model. Nature reviews neuroscience. 2001;2(10):717. pmid:11584309
  15. 15. Guediche S, Holt LL, Laurent P, Lim S-J, Fiez JA. Evidence for cerebellar contributions to adaptive plasticity in speech perception. Cerebral Cortex. 2014;25(7):1867–77. pmid:24451660
  16. 16. Shaywitz SE. Dyslexia. New England Journal of Medicine. 1998;338(5):307–12. pmid:9445412
  17. 17. Snowling MJ. Dyslexia: Blackwell publishing; 2000.
  18. 18. Krishnan S, Watkins KE, Bishop DV. Neurobiological basis of language learning difficulties. Trends in cognitive sciences. 2016;20(9):701–14. pmid:27422443
  19. 19. Nicolson RI, Fawcett AJ. Procedural learning difficulties: reuniting the developmental disorders? TRENDS in Neurosciences. 2007;30(4):135–41. pmid:17328970
  20. 20. Nicolson RI, Fawcett AJ. Dyslexia, dysgraphia, procedural learning and the cerebellum. Cortex. 2011;47(1):117–27. pmid:19818437
  21. 21. Gabay Y, Holt LL. Incidental learning of sound categories is impaired in developmental dyslexia. cortex. 2015;73:131–43. pmid:26409017
  22. 22. Banai K, Ahissar M. Poor sensitivity to sound statistics impairs the acquisition of speech categories in dyslexia. Language, Cognition and Neuroscience. 2018;33(3):321–32.
  23. 23. Gabay Y, Schiff R, Vakil E. Dissociation between the procedural learning of letter names and motor sequences in developmental dyslexia. Neuropsychologia. 2012;50(10):2435–41. pmid:22750119
  24. 24. Gabay Y, Thiessen ED, Holt LL. Impaired statistical learning in developmental dyslexia. Journal of Speech, Language, and Hearing Research. 2015;58(3):934–45. pmid:25860795
  25. 25. Gabay Y, Vakil E, Schiff R, Holt LL. Probabilistic category learning in developmental dyslexia: Evidence from feedback and paired-associate weather prediction tasks. Neuropsychology. 2015;29(6):844. pmid:25730732
  26. 26. Hedenius M, Ullman MT, Alm P, Jennische M, Persson J. Enhanced recognition memory after incidental encoding in children with developmental dyslexia. PloS one. 2013;8(5):e63998. pmid:23717524
  27. 27. Lum JA, Ullman MT, Conti-Ramsden G. Procedural learning is impaired in dyslexia: Evidence from a meta-analysis of serial reaction time studies. Research in developmental disabilities. 2013;34(10):3460–76. pmid:23920029
  28. 28. Pavlidou EV, Louise Kelly M, Williams JM. Do children with developmental dyslexia have impairments in implicit learning? Dyslexia. 2010;16(2):143–61. pmid:20440744
  29. 29. Pavlidou EV, Williams JM. Implicit learning and reading: Insights from typical children and children with developmental dyslexia using the artificial grammar learning (AGL) paradigm. Research in developmental disabilities. 2014;35(7):1457–72. pmid:24751907
  30. 30. Gabay Y, Schiff R, Vakil E. Dissociation between online and offline learning in developmental dyslexia. Journal of clinical and experimental neuropsychology. 2012;34(3):279–88. pmid:22221291
  31. 31. Gabay Y, Schiff R, Vakil E. Attentional requirements during acquisition and consolidation of a skill in normal readers and developmental dyslexics. Neuropsychology. 2012;26(6):744. pmid:23106118
  32. 32. Inácio F, Faísca L, Forkstam C, Araújo S, Bramão I, Reis A, et al. Implicit sequence learning is preserved in dyslexic children. Annals of dyslexia. 2018;68(1):1–14. pmid:29616459
  33. 33. Rüsseler J, Gerth I, Münte TF. Implicit learning is intact in adult developmental dyslexic readers: Evidence from the serial reaction time task and artificial grammar learning. Journal of Clinical and Experimental Neuropsychology. 2006;28(5):808–27. pmid:16723326
  34. 34. Kelly SW, Griffiths S, Frith U. Evidence for implicit sequence learning in dyslexia. Dyslexia. 2002;8(1):43–52. pmid:11990224
  35. 35. Franceschini S, Gori S, Ruffino M, Viola S, Molteni M, Facoetti A. Action video games make dyslexic children read better. Current Biology. 2013;23(6):462–6. pmid:23453956
  36. 36. Gori S, Facoetti A. Perceptual learning as a possible new approach for remediation and prevention of developmental dyslexia. Vision research. 2014;99:78–87. pmid:24325850
  37. 37. Agnew JA, Dorn C, Eden GF. Effect of intensive training on auditory processing and reading skills. Brain and Language. 2004;88(1):21–5. pmid:14698727
  38. 38. Agus TR, Carrión-Castillo A, Pressnitzer D, Ramus F. Perceptual learning of acoustic noise by individuals with dyslexia. Journal of Speech, Language, and Hearing Research. 2014;57(3):1069–77. pmid:24167235
  39. 39. Banai K, Ahissar M. Perceptual learning as a tool for boosting working memory among individuals with reading and learning disability. Learning & Perception. 2009;1(1):115–34.
  40. 40. Daikhin L, Raviv O, Ahissar M. Auditory stimulus processing and task learning are adequate in dyslexia, but benefits from regularities Are reduced. Journal of Speech, Language, and Hearing Research. 2017;60(2):471–9. pmid:28114605
  41. 41. Kujala T, Karma K, Ceponiene R, Belitz S, Turkkila P, Tervaniemi M, et al. Plastic neural changes and reading improvement caused by audiovisual training in reading-impaired children. Proceedings of the National Academy of Sciences. 2001;98(18):10509–14.
  42. 42. Magnan A, Ecalle J, Veuillet E, Collet L. The effects of an audio-visual training program in dyslexic children. Dyslexia. 2004;10(2):131–40. pmid:15180394
  43. 43. Rosen S. Auditory processing in dyslexia and specific language impairment: Is there a deficit? What is its nature? Does it explain anything? Journal of phonetics. 2003;31(3–4):509–27.
  44. 44. Guediche S, Fiez JA, Holt LL. Adaptive plasticity in speech perception: Effects of external information and internal predictions. Journal of Experimental Psychology: Human Perception and Performance. 2016;42(7):1048. pmid:26854531
  45. 45. Banai K, Lavner Y. The effects of exposure and training on the perception of time-compressed speech in native versus nonnative listeners. The Journal of the Acoustical Society of America. 2016;140(3):1686–96. pmid:27914374
  46. 46. Ben-Simon A, Inbar-Weiss N. MATAL test battery for the diagnosis of learning disabilities: User guide. Jerusalem: National Institute for testing and evaluation and the Council for Higher Education (Hebrew). 2012.
  47. 47. Breznitz Z, Shaul S, Horowitz-Kraus T, Sela I, Nevat M, Karni A. Enhanced reading by training with imposed time constraint in typical and dyslexic adults. Nature communications. 2013;4:1486. pmid:23403586
  48. 48. Shiran A, Breznitz Z. The effect of cognitive training on recall range and speed of information processing in the working memory of dyslexic and skilled readers. Journal of Neurolinguistics. 2011;24(5):524–37.
  49. 49. Wechsler D. WAIS-III, Wechsler adult intelligence scale: Administration and scoring manual: Psychological Corporation; 1997.
  50. 50. Shatil E. One-minute test for words (Unpublished test, University of Haifa). 1995.
  51. 51. Shatil E. One-minute test for pseudowords. Unpublished test Haifa: University of Haifa. 1995.
  52. 52. Breznitz Z, Misra M. Speed of processing of the visual–orthographic and auditory–phonological systems in adult dyslexics: The contribution of “asynchrony” to word recognition deficits. Brain and language. 2003;85(3):486–502. pmid:12744959
  53. 53. Breznitz Z. Parsing test. Unpublished test University of Haifa, Israel. 1997.
  54. 54. Breznitz Z. Speed of phonological and orthographic processing as factors in dyslexia: Electrophysiological evidence. Genetic, Social, and General Psychology Monographs. 2003;129(2):183. pmid:14606733
  55. 55. Verhelst W, Roelands M, editors. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. Acoustics, Speech, and Signal Processing, 1993 ICASSP-93, 1993 IEEE International Conference on; 1993: IEEE.
  56. 56. Prior A, Bentin S. Differential integration efforts of mandatory and optional sentence constituents. Psychophysiology. 2006;43(5):440–9. pmid:16965605
  57. 57. Finkelstein M, Amir O. Speaking rate among professional radio newscasters: Hebrew speakers. Studies in Media and Communication. 2013;1(1):131–9.
  58. 58. Levitt H. Transformed up-down methods in psychoacoustics. The Journal of the Acoustical society of America. 1971;49(2B):467–77.
  59. 59. Karni A, Sagi D. Where practice makes perfect in texture discrimination: evidence for primary visual cortex plasticity. Proceedings of the National Academy of Sciences. 1991;88(11):4966–70.
  60. 60. Banai K, Lavner Y. The effects of training length on the perceptual learning of time-compressed speech and its generalization. The Journal of the Acoustical Society of America. 2014;136(4):1908–17. pmid:25324090
  61. 61. Banai K, Lavner Y. Perceptual learning of time-compressed speech: More than rapid adaptation. PloS one. 2012;7(10):e47099. pmid:23056592
  62. 62. Banai K, Ahissar M. Poor sensitivity to sound statistics impairs the acquisition of speech categories in dyslexia. Language, Cognition and Neuroscience. 2017:1–12.
  63. 63. Freeman BA, Beasley DS. Discrimination of time-altered sentential approximations and monosyllables by children with reading problems. Journal of Speech, Language, and Hearing Research. 1978;21(3):497–506.
  64. 64. Watson M, Stewart M, Krause K, Rastatter M. Identification of time-compressed sentential stimuli by good vs poor readers. Perceptual and motor skills. 1990;71(1):107–14. pmid:2235248
  65. 65. Agus TR, Carrión-Castillo A, Pressnitzer D, Ramus F. Perceptual learning of acoustic noise by dyslexic individuals. J Speech Lang Hearing Res. 2013;57:1069–77.
  66. 66. Gabay Y, Dick FK, Zevin JD, Holt LL. Incidental auditory category learning. Journal of Experimental Psychology: Human Perception and Performance. 2015;41(4):1124. pmid:26010588
  67. 67. Reber AS. Implicit learning of artificial grammars. Journal of verbal learning and verbal behavior. 1967;6(6):855–63.
  68. 68. Reed J, Johnson P. Assessing implicit learning with indirect tests: Determining what is learned about sequence structure. Journal of Experimental Psychology: Learning, Memory, and Cognition. 1994;20(3):585.
  69. 69. Karni A. The acquisition of perceptual and motor skills: a memory system in the adult human cortex. Cognitive Brain Research. 1996;5(1–2):39–48. pmid:9049069
  70. 70. Cohen NJ, Squire LR. Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science. 1980;210(4466):207–10. pmid:7414331
  71. 71. Pisoni DB. Long-term memory in speech perception: Some new findings on talker variability, speaking rate and perceptual learning. Speech communication. 1993;13(1–2):109–25. pmid:21461185
  72. 72. Heald S, Nusbaum HC. Speech perception as an active cognitive process. Frontiers in systems neuroscience. 2014;8:35. pmid:24672438
  73. 73. Inspector M, Manor D, Amir N, Kushnir T, Karni A. A word by any other intonation: fMRI evidence for implicit memory traces for pitch contours of spoken words in adult brains. PloS one. 2013;8(12):e82042. pmid:24391713
  74. 74. Karni A. The acquisition of perceptual and motor skills: a memory system in the adult human cortex. Cognitive Brain Research. 1996.
  75. 75. Karni A, Meyer G, Rey-Hipolito C, Jezzard P, Adams MM, Turner R, et al. The acquisition of skilled motor performance: fast and slow experience-driven changes in primary motor cortex. Proceedings of the National Academy of Sciences. 1998;95(3):861–8.
  76. 76. Jacoby LL, Brooks LR. Nonanalytic cognition: Memory, perception, and concept learning. Psychology of learning and motivation. 18: Elsevier; 1984. p. 1–47.
  77. 77. Karni A, Bertini G. Learning perceptual skills: behavioral probes into adult cortical plasticity. Current opinion in neurobiology. 1997;7(4):530–5. pmid:9287202
  78. 78. Seitz AR, Watanabe T. The phenomenon of task-irrelevant perceptual learning. Vision research. 2009;49(21):2604–10. pmid:19665471
  79. 79. Ofen-Noy N, Dudai Y, Karni A. Skill learning in mirror reading: how repetition determines acquisition. Cognitive Brain Research. 2003;17(2):507–21. pmid:12880920
  80. 80. Primor L, Pierce ME, Katzir T. Predicting reading comprehension of narrative and expository texts among Hebrew-speaking readers with and without a reading disability. Annals of Dyslexia. 2011;61(2):242–68. pmid:21993604
  81. 81. Perrachione TK, Del Tufo SN, Gabrieli JD. Human voice recognition depends on language ability. Science. 2011;333(6042):595-. pmid:21798942
  82. 82. Korman M, Dagan Y, Karni A. Nap it or leave it in the elderly: a nap after practice relaxes age-related limitations in procedural memory consolidation. Neuroscience letters. 2015;606:173–6. pmid:26348880
  83. 83. Adi-Japha E, Fox O, Karni A. Atypical acquisition and atypical expression of memory consolidation gains in a motor skill in young female adults with ADHD. Research in Developmental Disabilities. 2011;32(3):1011–20. pmid:21349685
  84. 84. Vandermosten M, Boets B, Luts H, Poelmans H, Golestani N, Wouters J, et al. Adults with dyslexia are impaired in categorizing speech and nonspeech sounds on the basis of temporal cues. Proceedings of the National Academy of Sciences. 2010;107(23):10389–94.