Fig. 21.1
Example of the ‘artificial scene’ stimuli used. See text for details
This chapter is focused on examining sensitivity to change-related transients. While such sensitivity has been shown to underlie change detection in vision (Yantis and Jonides 1984; O’Regan et al. 1999), in audition, the role of sensitivity to change-evoked transients is generally dismissed on account that natural sounds are inherently characterised by energy fluctuations that would mask any genuine change-related transients. To address this issue, we designed our stimuli such that scene elements contain numerous onsets and offsets, mimicking the fluctuation properties of complex sounds. Physically, appearance and disappearance of a component are associated with a ‘local transient’ – an abrupt change in stimulus power within a certain frequency band, resulting in a small number of neighbouring cells sharply changing their firing pattern, while the statistics of the activity in the rest of the array are unaltered. A mechanism able to detect such local transients from within the numerous noninformative transients characterising scene elements might support change detection since it is able to indicate the time, the frequency region and the nature of the change (appearance vs. disappearance).
Computationally, detection of item appearance should be comparatively easy, as it is associated with appearance of energy within a frequency band that was previously silent. Disappearance, on the other hand, is not easily distinguished from the many offsets that occur due to ongoing modulation. Disappearance detection requires a smarter, ‘second-order transient’ detection mechanism, capable of acquiring the statistical rules of the ongoing sound and signalling when those rules are violated, e.g. when an expected tone pip fails to arrive. The existence of such ‘smart’ offset detection mechanisms (albeit in the context of a single sequence rather than several concurrent sources), operating automatically irrespective of listeners’ attentional focus, has been demonstrated in several recent human brain imaging studies (Chait et al. 2008; Yamashiro et al. 2009; Fujioka et al. 2009), and it has been hypothesised that they might play a role in scene change detection.
2 Stimuli
We use artificial ‘scenes’ (Fig. 21.1) populated by multiple streams of pure tones that are designed to model acoustic sources (scene size of 4, 8 and 14 sources). Each source is characterised by a different carrier frequency (drawn from a pool of fixed values spaced at 2*ERB) and is furthermore amplitude modulated (AM) by a square wave (the source can be seen as a stream of tone pips) at a distinct rate (between 3 and 35 Hz). The AM mimics temporal properties found across many natural sounds and ensures that, similarly to natural scenes, the stimuli are perceived as a composite ‘soundscape’ that is perceptually separable, so that each stream can be attended to individually (as verified in a control experiment).
We refer to scenes in which each source is active throughout the stimulus, as ‘no change’ stimuli (NC). Additionally, versions in which a single component is removed partway through the scene (‘change-disappear’, CD, stimuli) and versions in which the same single component is added to the scene (‘change-appear’, CA, stimuli) are created. The timing of change varies randomly. For appearing components, the nominal time of change is set at the introduction of the first non-zero sound sample to the scene; for disappearing components, the time of change is the time at which the next tone burst is expected to appear (dashed line in Fig. 21.1). The choice of frequencies and AM rates is random for each scene, but to enable a controlled comparison between CA and CD, the stimuli are generated as NC/CD/CA triplets (as in Fig. 21.1). These are presented in random order during the experiment (blocked by change type, NC/CA and NC/CD).
3 Experiment 1
In Experiment 1 (part of the data reported in Cervantes Constantino et al. 2012), we tested listeners’ ability to detect sudden scene changes, manifested as the appearance or disappearance of an element. We also assessed the extent to which performance is affected by interrupting the scenes (at the time of change) with a silent gap.
3.1 Materials and Methods
Ten subjects participated in the experiment (5 female; mean age = 23.8 years).
The stimulus set included two conditions: (a) ‘continuous’ stimuli constructed as described in Sect. 2, above, and (b) ‘silent-gap’ stimuli with a 200 ms silent gap inserted at the time of change; gap duration (200 ms) was chosen to be as short as possible so as to minimise reliance on memory capacities but longer than the longest inter-pulse interval (corresponding to the slowest AM rate used) in order to introduce a detectable gap for all scene components. The signals before and after the gap were ramped with a 10 ms cosine-squared ramp. For each gap condition, signals were generated as NC/CD/CA triplets such that NC signals also contained a gap at the same time as their matching CA and CD scenes. Stimulus presentation was blocked by change type and gap type (no gap/silence). The proportion of change events was 50 % in each block. Block order was randomised across listeners. Experimental sessions lasted about 2 h and consisted of a short practice session with feedback, followed by the main experiment with no feedback, divided into runs of about 10 min. Subjects were instructed to fixate at a cross presented on the computer screen and press a keyboard button as soon as possible when they detected a change in the ongoing scene stimulus.
3.2 Results and Discussion
The results are presented in Fig. 21.2. Panel a shows data for continuous scenes – listeners are very sensitive to source appearance but have difficulty detecting disappearing sources (the performance advantage is also reflected in response times; not shown). This finding is likely related to previously reported ‘enhancement’ effects (e.g. Erviti et al. 2011, for review; see also Chap. 19) which refer to the finding that local power change, e.g. an increase in the power of one component within a pure-tone chord, results in perceptual pop-out of the associated component away from the rest of the mixture (see also Bregman et al. 1994). Investigations of enhancement phenomena have usually used much simpler stimuli (pure-tone chords) than those used here, and the present demonstration thus suggests that appearance pop-out is a wide-ranging phenomenon, unaffected by the multitudes of onsets and offsets in our scene components.
Fig. 21.2
Results of Experiment 1 (a) Continous scenes (b) comparison between performance oncontinous scenes and silent-gap interrupted scenes
The CA advantage likely receives contribution from at least two different low-level neural mechanisms: (a) adaptation effects – changes to the sustained neuronal firing rate, which follows the signal while it is present in the scene. Adaptation could thus contribute to onset detection by reducing responses to the ongoing (nonchanging) scene components. (b) Local transients generated at signal onset – auditory onset-/offset-tuned cell populations are characterised by markedly different properties: offset-tuned cells are fewer in number, and their responses tend to be of longer latency and smaller amplitude (Scholl et al. 2010; Phillips et al. 2002) thus leading to larger, and earlier, on- than off-transient responses. These differences are consistent with our finding that appearance events are overall more detectable.