Learning to Pronounce First Words in Three Languages: An Investigation of Caregiver and Infant Behavior Using a Computational Model of an Infant

doi:10.1371/journal.pone.0110334

Figure 1.

Elija learns from babbling.

Panel A: Elija's (virtual) motor activity moves his vocal apparatus and he can explore the sensory consequences of this activity (1). This will sometimes result in the generation of acoustic output (2). The presence of acoustic output can be noticed by Elija (3a), as can other somato-sensory consequences of the vocal tract movement, such as touch arising from vocal tract closure (3b). The exploration can lead to the discovery of a motor pattern (4). Panel B: A discovered motor pattern is stored in motor memory (5).

More »

Expand

Figure 2.

Tutored equivalence.

Elija learns to pronounce using caregiver responses, which reinforce some utterances and allow him to associate his motor patterns to adult L1 speech output. Panel A: Elija first recalls a motor pattern, e.g. motor pattern 3, (1) and uses it to make an utterance (2). The caregiver hears the sounds (3). Panel B: The caregiver may reformulate it using her L1 interpretation of Elija's sound production (4). Elija hears the caregiver's response (5). Aware that he is being imitated, Elija takes the caregiver's utterance as equivalent to the output from his motor pattern, which reinforces motor pattern 3 and associates it with the response (6). If a motor pattern is not responded to, it will be deselected and have no link to an auditory memory (e.g. motor pattern 2).

More »

Expand

Figure 3.

Learning to pronounce a word using serial imitation of its component speech sounds.

Panel A: The caregiver says a word, in this case consisting of two distinct speech sounds (1). Elija hears the caregiver's utterance (2) and starts to process it (3). This involves performing an auditory matching to previously heard responses (4). Matching auditory memories are then activated in sequence (5,6). Panel B: The activated auditory memories in turn activate motor pattern 3 and motor pattern 1 in motor memory (7,8). They are then recalled in sequence (9) resulting in the generation of output speech (10), which constitutes Elija's imitation of the caregiver's utterance. Finally the caregiver hears and can evaluate Elija's response (11).

More »

Expand

Figure 4.

Repetitive interaction loops in word learning.

The caregiver first says a word (1). Elija recognizes its component sounds in terms of sounds he has heard before (2). Using the associated motor patterns, he then generates speech output (3). The caregiver evaluates Elija's response and, if not satisfied, may say the word again, perhaps more clearly (4). Elija performs recognition again (5) and generates a different response (6). This process can continue (7–9), until (as in this case) the caregiver decides that performance is satisfactory. Alternatively, if the task is not productive, the caregiver can give up and try to teach Elija a new word.

More »

Expand

Figure 5.

Elija's motor and perceptual systems.

Panel A: Elija's motor control system incorporates a Maeda articulatory speech synthesizer. A motor pattern is a sequence of articulatory targets for the synthesizer's control parameters. These are interpolated by a controller, which assumes that the articulator movements follow 2^nd order critically damped trajectories. The resulting sequences of time-varying parameter vectors drive the synthesizer. This potentially generates acoustic output, which is played out via a loudspeaker. In addition, the effort in the production is estimated and any closure of the vocal tract is reported. Panel B: Elija's perceptive system. A USB microphone first digitizes the acoustic input. Autocorrelation analysis is applied directly to the waveform to estimate its fundamental frequency F0. An auditory filter bank provides pre-processing of the input. Further processing estimates signal salience, which is used by the reward mechanism. Pre-processed input can be recorded in auditory memory and also compared against past memories using a speech sound recognizer that is based on DTW.

More »

Expand

Figure 6.

Formation of associations between motor and auditory memories.

Elija generates an acoustic output by using a previously discovered motor pattern. After production, Elija records any potential response from the caregiver. If the caregiver responds, the auditory salience of this response will contribute to a reward signal. This will cause Elija to remember the speech input response, reinforce the motor pattern and also build an association between the two.

More »

Expand

Table 1.

Archiphoneme consolidations for English, German and French.

More »

Expand

Figure 7.

Statistical analysis of the 6-caregiver multilingual response dataset.

A Percentage of Elija's motor patterns responded to by each individual caregiver. B Percentage of motor patterns responded to against the number of caregivers that responded to them. C Distribution of vowel qualities plotted on the IPA vowel quadrilateral. The spread of the data shows that the vowel qualities in Elija's utterances as perceived and responded to by the caregivers covered a wide range. D Distribution of the consonantal places of articulation. A wide range of perceived places of articulation were present in Elija's utterances.

More »

Expand

Figure 8.

Caregiver response statistics.

Responses of different types made by caregivers to Elija's motor patterns are shown as a proportion of total responses. Panel A shows the overall proportions of reformulations (yellow bars), mimicked responses (green bars) and idiomatic responses (blue bars) for all individual subjects. Panel B shows the mean across all subjects with the exception of E3, who was treated as an outlier since he mimicked many more responses than the other caregivers.

More »

Expand

Figure 9.

Relationship between English, German and French responses.

Summed caregiver response comparisons are shown in terms of their archiphoneme vowel and consonant components. One set of response sessions is represented on the LHS and another set on the RHS of each panel. The area of the yellow nodes represents occurrences of the given phonemic category. Red line width indicates incidence with the same interpretation across sessions; blue line width indicates incidence with a different interpretation across sessions. The 4 English response data sessions are always represented on the LHS and the 2 German and 2 French data sessions on the RHS of each respective panel. A English/German vowel comparisons. B English/German consonant comparisons. C English/French vowel comparisons. D English/French consonant comparisons. E German/French vowel comparisons. F German/French consonant comparisons.

More »

Expand

Figure 10.

Relationships within English, German and French responses.

Results are plotted as in Fig. 9. A & B Vowel and consonant comparisons for a single English speaker over four separate sessions. C & D Vowel and consonant comparisons between four different English speakers. E & F Vowel and consonant comparisons between two different German speakers. G & H Vowel and consonant comparisons between two different French speakers.

More »

Expand

Figure 11.

Comparison between caregiver responses.

The comparisons are made in terms of their archiphoneme vowel and consonant components. These values correspond to the red lines shown on Figs. 9 and 10. Panels A & B show vowel and consonant response comparisons respectively: similarity within the single English speaker is shown as the blue bar, different speaker similarity for same language groups are shown as green bars, and cross language group similarities are shown as yellow bars. The error bars show 95% confidence intervals.

More »

Expand

Figure 12.

Examples of words learned by Elija.

Results for 2 subjects speaking English, French and German are shown for subjects E1 & E2, F1 & F2 and G1 & G2 respectively. The left column specifies the target word, and the middle column is the phonemic transcription of the caregiver's final target production. The right column is the phonemic transcription of the caregiver's reformulations corresponding to Elija's imitations.

More »

Expand

Figure 13.

Individual subject word comparisons for English, French and German.

Comparisons between archiphoneme representations of caregiver target words and Elija's imitations. Individual speakers are shown in the six panels E1 & E2, F1 & F2 and G1 & G2 respectively. The caregiver target word transcriptions converted to archiphoneme categories are shown on the LHS of each diagram. Elija's imitations were labeled in terms of archiphoneme of the component responses from which they are constructed. These are shown on the RHS of each diagram.

More »

Expand