Fig. 15.1
Schematic spectrograms of the stimuli used for the symmetric condition (top) and asymmetric condition (bottom). See text for details
A major motivation for this study was to provide experimental evidence relating to the possible existence of frequency-shift detectors (FSDs), which are neural systems sensitive to frequency changes in a specific direction. Evidence for the existence of FSDs was provided by Demany and Ramos (2005). They presented an inharmonic chord with components spaced by at least 0.5 octave followed by a single pure tone that was either identical to a partial in the chord or was halfway in frequency between two partials. These two types of sequence could not be reliably discriminated. However, if the single tone was slightly (e.g. one semitone) lower or higher in frequency than one of the partials in the chord, a pitch shift was perceived and subjects could distinguish the two types of sequence. Demany and Ramos explained this effect in terms of FSDs. They argued that, like the motion-detection system in vision, subjects are sensitive to the balance of responses between FSDs sensitive to upward and downward frequency shifts. They also argued that the sensitivity of the FSDs varied with the magnitude of the frequency shift. When the probe was midway in frequency between two partials in the chord or coincided with a partial in the chord, the up-FSDs and down-FSDs would have been equally activated, so the FSDs did not provide a useful discrimination cue. When the probe was slightly mistuned from one of the partials in the chord, the up-FSDs and down-FSDs would have been differentially activated, providing a discrimination cue. Subsequent work has suggested that the FSDs are optimally activated by a shift of about 7 % (Demany et al. 2009, 2010, 2011).
The possible influence of FSDs in the present task was assessed by comparing two conditions. The chord contained partials that were equally spaced on the ERBN-number scale (Glasberg and Moore 1990), which has units called Cams (Moore 2012). This means that all partials were roughly equally well resolved in the auditory periphery (Moore et al. 2006). For medium frequencies, the “optimal” frequency shift for activating the FSDs corresponds to a shift on the ERBN-number scale of about 0.5–0.6 Cams. In the “symmetric” condition (Fig. 15.1, top panel), the frequency of the mistuned probe was midway in Cams between two partials in the chord. This should have led to reasonably symmetric activation of the up-FSDs and down-FSDs, such that differential activation provided a minimal cue. In the “asymmetric” condition (Fig. 15.1, bottom panel), the mistuned probe was much closer in frequency to one partial in the chord than to the next closest partial. This should have led to differential activation of the up-FSDs and down-FSDs, providing a strong discrimination cue (in comparison to the interval in which the probe coincided in frequency with a partial in the chord). If the FSDs do not influence performance in this task, then performance should be better for the symmetric than for the asymmetric condition, since in the interval in which the probe is mistuned, the mistuning from the target is greater for the former than for the latter. In contrast, if the FSDs do play a role, performance might better in the asymmetric than in the symmetric condition.
2 Method
2.1 Stimuli and Procedure
The partials in the chord were separated by 2.5 Cams, so the chord was inharmonic. The relationship between Cam value and frequency, f (Hz), was assumed to be as suggested by Glasberg and Moore (1990):
(15.1)
Uniform spacing on the ERBN-number scale was chosen because the salience of frequency changes is roughly constant across centre frequencies when the extent of the change is expressed in Cams (Hermes and van Gestel 1991). The spacing of 2.5 Cams was chosen since it led to a substantial difference between the symmetric and asymmetric conditions.
In each observation interval, a 200-ms sinusoidal probe was followed after a 300-ms silent interval by a 200-ms chord. All stimuli had 20-ms raised-cosine ramps and durations are specified between half-amplitude points. The intervals were separated by 500 ms of silence. In one randomly chosen interval, the probe coincided in frequency with a partial (the target) in the chord. In the other interval, the probe differed in frequency from the target. The task was to indicate the interval in which the probe coincided with the target. The target was selected randomly from one trial to the next. Only the inner partials (2–6) were used as targets. Feedback as to the correct answer was provided after each trial. The duration of the stimuli was chosen to avoid ceiling effects that might have occurred if a longer duration had been used (Moore and Ohgushi 1993; Moore et al. 2006).
The frequencies of the partials in the chord corresponded to 9.5, 12, 14.5, 17, 19.5, 22, and 24.5 Cams (corresponding to 408.5, 606, 864, 1,202, 1,645, 2,224, and 2,983 Hz). For the symmetric condition, the probe was separated by 1.25 Cams from the target and hence fell midway between two partials, which was expected to lead to roughly equal activation of the up-FSDs and down-FSDs. For the asymmetric condition, the probe was separated by 0.625 Cams from the target and by 1.875 Cams from the next nearest partial. The shift of 0.625 Cams was expected to lead to near-optimal activation of the FSDs, while the shift of 1.875 Cams was expected to lead to reduced activation of the FSDs, giving differential activation of the up-FSDs and down-FSDs. This differential activation was predicted to lead to better performance for the asymmetric condition.
For each observation interval, the frequencies of all components (both probe and chord) were multiplied by a common factor randomly chosen from a uniform distribution between 0.9 and 1.1. This was done to prevent subjects from using the frequency of the probe as a cue. In a given block of trials, each of the inner partials in the chord was selected as the target ten times: five with the probe mistuned downwards and five with it mistuned upwards. Each block was repeated at least five times, so each partial in the chord was the target at least 50 times. Stimuli were generated digitally and presented via one earpiece of a Sennheiser HD580 headphone. The level of the probe and of each partial in the chord was 60 dB SPL.
2.2 Subjects
Six subjects with normal hearing were tested. Their ages ranged from 21 to 67 years. Most subjects had previously taken part in a similar experiment using 1,000-ms stimuli rather than 200-ms stimuli. Hence, they had several hours of experience in a similar task.
3 Results
For each target and each subject, scores were averaged for the case when the mistuned probe was lower and higher in frequency than the target. These average scores are plotted for each subject in Fig. 15.2. For statistical analysis, the scores were transformed to RAU (Studebaker 1985). The scores in RAU were averaged across subjects and transformed back to percent correct. The dashed lines show the means obtained in this way.
Fig. 15.2
The score for each partial for the symmetric (left) and asymmetric (right) conditions. Each symbol represents one subject, and the dashed lines show the means
A within-subjects analysis of variance was conducted, with factors condition (symmetric and asymmetric) and partial number for the target (2–6). There was no significant main effect of condition, but there was a significant effect of partial number: F(4, 20) = 2.99, p = 0.044. There was a highly significant interaction between condition and partial number: F(4, 20) = 7.06, p < 0.001. This reflects the fact that scores were higher for the asymmetric than for the symmetric condition for partials 2, 3, 4, and 5, but the reverse was true for partial 6. The mean scores in RAU are given in Table 15.1. Overall, the results are consistent with the prediction based on FSDs for partials 2–5, but not for partial 6.
Table 15.1
Mean score in RAU for each condition and each partial number