The Development of Audio-Visual Integration for Temporal Judgements

Adults combine information from different sensory modalities to estimate object properties such as size or location. This process is optimal in that (i) sensory information is weighted according to relative reliability: more reliable estimates have more influence on the combined estimate and (ii) the combined estimate is more reliable than the component uni-modal estimates. Previous studies suggest that optimal sensory integration does not emerge until around 10 years of age. Younger children rely on a single modality or combine information using inappropriate sensory weights. Children aged 4–11 and adults completed a simple audio-visual task in which they reported either the number of beeps or the number of flashes in uni-modal and bi-modal conditions. In bi-modal trials, beeps and flashes differed in number by 0, 1 or 2. Mutual interactions between the sensory signals were evident at all ages: the reported number of flashes was influenced by the number of simultaneously presented beeps and vice versa. Furthermore, for all ages, the relative strength of these interactions was predicted by the relative reliabilities of the two modalities, in other words, all observers weighted the signals appropriately. The degree of cross-modal interaction decreased with age: the youngest observers could not ignore the task-irrelevant modality—they fully combined vision and audition such that they perceived equal numbers of flashes and beeps for bi-modal stimuli. Older observers showed much smaller effects of the task-irrelevant modality. Do these interactions reflect optimal integration? Full or partial cross-modal integration predicts improved reliability in bi-modal conditions. In contrast, switching between modalities reduces reliability. Model comparison suggests that older observers employed partial integration, whereas younger observers (up to around 8 years) did not integrate, but followed a sub-optimal switching strategy, responding according to either visual or auditory information on each trial.


Causal Inference model
In addition to the Partial Integration model, the Causal Inference model has also been proposed to describe behaviour in situations where two sensory signals are not fully integrated to produce a single, multisensory estimate. The model is described in detail, with full equations, elsewhere [1,2]. Therefore it is only described in brief here, as applied to the current experimental paradigm. In order to evaluate the Causal Inference model, the noise distributions, posterior, and response distributions are modelled by simulating a large number of experimental trials for each stimulus condition. Accordingly, uni-modal noise distributions were generated for each condition of the current experiment by simulating 10,000 trials. For each simulated trial, noisy visual and auditory sensory estimates ( x V and x A ) were generated by sampling from normal distributions centred on the true number of flashes and beeps, with standard deviations σ V and σ A , respectively. Because the number of events can only take integer values of zero or more, the noisecorrupted sensory estimates were rounded to the nearest integer. Simulated trials resulting in negative estimates were discarded. The resultant noise distributions are shown in Fig S1a.   Fig. S1. The Causal Inference model. (a) Visual (green) and auditory (red) noise distributions for an example condition with 1 flash and 3 beeps, generated via 10,000 simulated trials using, for illustration, the average fitted parameters across all observers. The prior over the number of events is also shown (dashed black). (b) Visual (green) and auditory (red) estimates given separate causes. The probability distribution of common cause estimates is shown by the dashed brown line. (c, d, e) Response distributions for the estimated number of flashes (green) or beeps (red), given each of the 3 decision rules: (c) Model averaging, (d) Model selection, and (e) Probability Matching.
In parallel to the Partial Integration and Switching models, the Causal Inference model also includes a prior probability distribution over the number of events, defined by µ P and σ P , (see dashed black line in Fig S1a). However, rather than employing a coupling prior to determine the degree of sensory integration, the Causal Inference model considers two discrete causal structures. Under the hypothesis that the visual and auditory estimates had a common cause (C=1), the sensory signals are fully integrated such that the estimated number of flashes (ŝ V ,C=1 ) is equal to the estimated number of beeps (ŝ A,C=1 ). This common cause estimate is given by a weighted average of x V , x A and µ P , where the weights are given by 1 /σ V 2 , 1 /σ A 2 and 1 /σ P 2 , respectively. The probability distribution of common cause estimates, produced by 10,000 simulated trials is shown by the dashed brown line, Fig S1b. Under the second hypothesis, the sensory estimates ( x V and x A ) arose from separate, independent causes. In this case, the estimated number of flashes (ŝ V ,C=2 ) can differ from the estimated number of beeps (ŝ A,C=2 ), with these estimates given by a weighted average of x V and µ P , or x A and µ P , respectively. Distributions of these separate cause estimates are also shown in Fig S1b. The response produced on each trial depends upon the posterior probability of a common cause (vs. separate causes), and the decision rule. The posterior probability of a single source is calculated, via Bayes' rule, from the conditional probability of the sensory estimates, given a common cause, and the prior probability of a common cause, i.e. pcommon, or p(C=1): . The posterior probability of separate causes is calculated in an analogous fashion, where the prior probability of separate causes, p(C=2) is equal to 1 -pcommon.
Three different decision rules (described below) have been proposed to determine how a response is generated on each trial. The corresponding response distributions, from simulated trials, are shown in Fig S1(c-e).
(i) Model averaging. This is the optimal decision rule, if the goal is to minimise the expected mean squared error. The response is determined by a weighted average of the estimated number of flashes (or beeps) given common or separate causes, with weights given by the posterior probability of these two causal states, i.e.
, with a similar computation determining the auditory response, ŝ A .
(ii) Model selection. On each trial, the response reflects either the estimate given a common cause, or the estimate given separate causes. The selection is determined by the posterior probabilities of the two causal states, if a common cause is more likely, given the sensory estimates, then the single cause estimate is selected, i.e.
(iii) Probability matching. Similarly to model selection, on each trial the response reflects either the common cause estimate or the separate causes estimate. However, under probability matching, this selection process is stochastic (as in the Switching models described above). The probability of selecting the common cause estimate is given by the posterior probability of a common cause; the probability of selecting the independent causes estimate is given by the posterior probability of separate causes. This can be implemented by sampling a value ζ , from a uniform distribution in the range 0 to 1, then The Causal Inference model, as applied to the current experimental paradigm, has 5 free parameters: (i) visual variability, (ii) auditory variability, (iii) the prior probability of a common cause, (iv) the mean of the prior over the number of events, and (v) the standard deviation of this prior. For each of the three decision rule variants, and separately for each participant, the parameters were found (via Matlab: fminsearch) that maximised the joint likelihood of the participant's data across all uni-modal and bi-modal conditions. To avoid the problem of local minima, 288 search iterations were performed, with initial values sampled from the multidimensional space of plausible parameters. The best fitting parameters are given in Table S1, which also shows the number of participants whose data were best described by each of the three decision rule variants of the model.
Of the three Causal Inference variants, the majority of participants were best fit by the Probability Matching decision rule. This is in broad agreement with previous work in adults [2]. The decision rule did not vary substantially or systematically as a function of age: within the three variants of the Causal Inference model, Probability Matching provided the best fit to the majority of participants in each age group.

Causal Inference compared to Partial Integration and Switching Models
Because the Partial Integration, Switching, and Causal Inference models are matched in complexity (each with 5 free parameters) we can compare how well they account for participants' data by directly comparing the likelihood of the data given each model. Fig S2a shows the proportion of participants that is best fit by each of the six models considered thus far, as a function of age group. If we take the log likelihood of the data for each participant given the best fitting of the original three models (Partial Integration, Focal Switching, Modal Switching), and compare this with the best fitting variant of the Causal Inference models, we can see that the Causal Inference models provide a significantly worse fit to the data (t75=-6.8, p=2x10 -9 ). The Causal Inference models provide a worse fit to the data, relative to the original 3 models for 63 of the 76 observers. This is illustrated in Fig S2b, which shows for each observer, the log likelihood of the best fitting of the original 3 models, against the log likelihood of the best fitting of the Causal Inference models. Results of logarithmic coding: The proportion of participants best fit by each of the three models, as a function of age, when uncertainty is Gaussian in log space. (d) For the Partial Integration model, (e) the Focal Switching model and (f) the Modality Switching model, the BIC is shown for each of the 8 sub-types, averaged across all 76 observers. The first four model sub-types (grey) are the same as the last four (black), other than the inclusion of a prior over the number of events for the latter. Error bars give ±1SE across observers. For all three types of model, sub-type 5 is optimal (lowest BIC), as indicated by the star symbols. Within each of the three model types, the BIC varied significantly across subtypes (one-factor repeated measures ANOVAs, F7,525=23.6, p<10 -27 ; F7,525=32.2, p<10 -36 ; F7,525=30.9, p<10 -35 for the Partial Integration, Focal Switching and Modality Switching models, respectively) and sub-type 5 performed significantly better than all others (all p<10 -5 , from paired t-tests, after Bonferroni corrections for multiple (7) comparisons), with the exception of sub-type 6, which was not significantly worse than 5 within either the Focal or Modality Switching models.

Logarithmic coding of number
Previous work, e.g. [3], has suggested that number may be coded logarithmically, such that the perceptual difference between successive integers becomes increasingly small. For the current paradigm, this would suggest that sensory likelihoods should be positively skewed. To determine whether our participants appear to be estimating number in this way, the Partial Integration, Switching and Causal Inference models were reevaluated with skewed likelihoods and prior distributions, such that p(log(frequency)) was Gaussian distributed. Overall, this change made little difference to the general pattern of results, with younger participants best modelled by the Switching models and older observers following the Partial Integration model (Fig S2c). In comparison to the linear versions of the models, Logarithmic coding produced a slightly, but insignificantly worse fit to observers' data, in terms of the average log likelihood (t75=-1.49, p>0.05). It is possible that logarithmic coding of number becomes more apparent when the number of events increases beyond the subitizing range.

Models of varying complexity
We also compare models with and without a prior over the number of events. For each type of model: Partial Integration (PI), Focal Switching (FS) and Modality Switching (MS), we evaluated 8 different model sub-types of varying complexity. For each sub-type, the parameter values that maximised the likelihood of each observer's data were found (Matlab: fminsearch). Multiple iterations of the search procedure were completed for each observer and subtype, with different initial parameter values, sampled uniformly from the multi-dimensional space of plausible values.
Model sub-types 1 -4 did not include a prior over the number of events.
(i) Sub-type 1 (3 free parameters): As in the optimal models, noise distributions are unbiased, with reliability characterised by a separate variance parameter for vision ( σ V ) and audition (σ A ). The third parameter gives the width of the coupling prior ( σ C , model PI), or the sampling probability ( p F or p V , models FS, MS).
(ii) Sub-type 2 (7 free parameters): Likelihoods are unbiased, but reliability varies as a function of both the number of events and modality (vision, audition; σ Vi , σ Ai , i =1, 2 or 3). The seventh parameter gives the width of the coupling prior or the sampling probability (σ C , p F or p V ).
(iii) Sub-type 3 (9 free parameters): Noise distributions can be biased ( µ Vi , µ Ai , i =1, 2 or 3 give the means). Reliability varies as a function of modality (σ V , σ A ), but is fixed across 1, 2 or 3 events. The ninth parameter gives the width of the coupling prior or the sampling probability (σ C , p F or p V ).
(iv) Sub-type 4 (13 free parameters): Noise distributions can be biased, and reliability varies as a function of the number of events and the modality ( µ Vi , µ Ai , σ Vi , σ Ai , i =1, 2 or 3). The final parameter describes the coupling prior or sampling probability (σ C , p F or p V ).