Fig. 3.1
Neuroanatomy of the auditory system in primates. (a) Ascending auditory pathway from the cochlea to the auditory cortices. Fibers in blue originate from neurons in the ventral cochlear nucleus, form the lemniscal pathway (LL), and eventually pass through the ventral division of the medial geniculate nucleus on their way to primary auditory cortex. Fibers in red originate from the dorsal cochlear nucleus and form the extralemniscal pathway. Low-frequency (L) and high-frequency (H) pathways are present throughout. (b) Cortical pathways for auditory processing in the macaque. Corticocortical projections of the central auditory system run along two segregated pathways: a ventral pathway (green) runs from the anterolateral belt (area AL) along the anterior superior temporal cortex to the ventrolateral prefrontal cortex, while a dorsal pathway (red) extends from the caudolateral belt (area CL) to superior temporal cortex and inferior parietal cortex and ends in dorsolateral prefrontal cortex. Discrete thalamic input to the two pathways is provided from different medial geniculate (MG) nuclei: The ventral part (MGv) projects only to the core fields A1 and R, whereas the dorsal part (MGd) projects to primary auditory cortex (A1) and the caudomedial field (CM) (Rauschecker et al. 1997). Likewise, feedforward projections from AL and CL are largely separated and target the rostral parabelt (RPB) and caudal parabelt (CPB) regions, respectively (Hackett et al. 1998). Additional pathways involve the middle lateral area (ML), posterior parietal cortex (PP), and RPB areas on the surface of the rostral superior temporal gyrus (Ts1/Ts2) (Pandya and Sanides 1973). Prefrontal cortex projections (PFC) are segregated in Brodmann areas 10 and 12 versus 8a and 46, respectively (Romanski et al. 1999). (a was modified and reprinted with permission from Henkel 2006; b was modified from Rauschecker and Romanski 2011; reproduced with permission from the original source, Rauschecker and Tian 2000)
In primates, conscious awareness of sound takes place within the various divisions of the auditory cortex (Fig. 3.1b). Within the auditory cortex, acoustic signals first travel to one or more of the primary cortical areas, which are most responsive to pure tones (Ghazanfar and Santos 2004). There are at least two widely agreed on primary cortical areas (A1 and R), but possibly there are as many as three or four (e.g., Kaas and Hackett 2000). Signals then travel to one or more of the surrounding seven (or so) auditory cortical belt areas and subsequently enter the prefrontal cortex of the frontal lobe, either directly from the belt or through functionally specific auditory parabelt areas in auditory and/or auditory-related fields in the superior temporal gyrus (Romanski et al. 1999; Kaas and Hackett 2000; Rauschecker and Tian 2000; Poremba et al. 2003; Hackett 2011; Rauschecker and Romanski 2011).
Like other major partitions of the primate auditory pathway, portions of the human and nonhuman primate auditory cortices work in a map-like fashion to represent frequency. For example, rhesus macaques and common marmosets have a tonotopic map on auditory area A1 (Aitkin et al. 1986; Micheyl et al. 2005). Individual fibers carry information from (and neurons are most responsive to) particular tones, with response strength decreasing sharply as frequencies depart from the preferred frequency. This organization is also present in most other mammals (e.g., cats: Imig and Adrian 1977).
3.2.2 Alternate Pathways for Spectral and Spatial Information
Neural processing of localization cues begins at the superior olivary nuclei of the medulla-pons junction and the inferior colliculus of the auditory midbrain. Later, at the cortical level in human and nonhuman primates, functional divergence of object-related (what) and spatial (where) information takes place after the primary auditory cortex in the superior temporal plane (Rauschecker and Tian 2000). More specifically, in humans, divergence takes place at the planum temporale, after which object-related spectral information is processed in the anterolateral planum temporale, planum polare, lateral Heschl’s gyrus, and the superior temporal gyrus anterior to Heschl’s gyrus (Warren and Griffiths 2003). Spatial information is processed in the posteromedial planum temporale and in the parietal and frontal lobes (Bushara et al. 1999). In macaques (Macaca sp.), divergence occurs in the belt areas (along the superior temporal gyrus): object-related spectral information proceeds from the anterior lateral belt through fields in the anteroventral superior temporal region into ventrolateral prefrontal cortex, whereas spatial information proceeds from the caudolateral belt and through fields in the posterodorsal superior temporal lobe and the posterior parietal cortex into dorsolateral prefrontal cortex (e.g., Romanski et al. 1999; Tian et al. 2001). Mostly based on clinical stroke studies, the posterior part of superior temporal gyrus (STG) in humans has classically been considered as specialized for speech processing (“Wernicke’s area”). Given reports from human imaging that anterior regions of STG are at least as selective for the perception of words as posterior regions (DeWitt and Rauschecker 2012), a redefinition of posterior STG as an area specializing in sensorimotor integration and control seems appropriate (Rauschecker 2011). This would include a role in spatial processing as well as in speech production and perception.
An important aspect of the primate central auditory system is its redundancy. For example, in the macaque lateral belt, signals are largely segregated into spatial (caudolateral belt) and nonspatial (anterior lateral belt) information; however, the streams obviously interact (Kaas and Hackett 1999; Romanski et al. 1999). Some neurons in the primate caudolateral belt respond to both location and specific calls, and the middle lateral belt is approximately equally selective for both call type and sound source location (Tian et al. 2001). Furthermore, each side of the brain receives and processes impulses from both ears, although in primates (human and nonhuman) the left cerebral hemisphere may have greater selectivity for processing temporal information, and the right cerebral hemisphere may have greater selectivity for processing spectral information (Joly et al. 2012; Ortiz-Rios et al. 2015).
3.2.3 Encoding Signals
In humans, the cortical region around Heschl’s gyrus, which also contains primary auditory cortex, is responsible for pitch perception (Schneider et al. 2005). A cortical area analogous to this region has been described for nonhuman primates (Bendor and Wang 2005). In their study on common marmosets, Bendor and Wang demonstrate that an area (restricted to low frequencies) on the border between two of the primary cortical areas (A1 and R) and adjacent to the anterior auditory cortical belt (AL and ML) contains pitch-selective neurons (also see Tomlinson and Schwarz 1988). Each neuron or group of neurons responds best to a specific pitch, whether it is generated by an actual pure tone or by a “missing fundamental” frequency represented by its spectral envelope.
Temporal relationships of signals and signal elements are important for identifying target proximity and location and distinguishing between calls (e.g., Ghazanfar and Santos 2004). In many cases, temporal alteration may affect representation more than spectral manipulation (Nagarajan et al. 2002; Ghazanfar and Santos 2004). Some neurons in the auditory midbrain respond selectively to order and spacing combinations (Wollberg and Newman 1972). This is demonstrated by the differential processing of temporally expanded and compressed vocalizations by the common marmoset (Wang et al. 1995) (Sect. 3.2.5). Other neurons in the auditory midbrain respond selectively to duration of frequency modulation or rates of amplitude modulation (e.g., Casseday et al. 1994). In another example, researchers presented a series of alternating high- and low-frequency tones to awake long-tailed macaques (Macaca fascicularis) and found that increasing the frequency separation, presentation rate, and tone duration improved the spatial differentiation of tonal responses on A1’s tonotopic map (p. 1656 in Fishman et al. 2004).
Studies on auditory cortex in anesthetized primates (e.g., common marmosets: Wang et al. 1995; squirrel monkeys, Saimiri sciureus: Bieser 1998) have reported that neurons mainly detect signal changes (onsets or transients). By contrast, when recording from primary auditory cortical and lateral belt neurons in awake common marmosets, Wang et al. (2005) found that responses are not only phasic but also tonic, indicating that some neurons respond continuously to spectrally and temporally optimal parts of the signal. Thus, cortical responses may be phasic (onset or offset), persistent tonic, inhibitory, and/or excitatory depending on stimulus frequency, intensity, location, and duration, similar to simple and complex cells in visual cortex (Tian et al. 2013). Since responses in anesthetized animals to pure tones are generally only phasic, they may not represent the full range of cortical responses/firing patterns. Considering this, studies of awake rather than anesthetized animals (e.g., Recanzone et al. 2000; Malone et al. 2002) may be preferable, depending on research questions and methods.
3.2.4 Are Primate Brains Specialized for Processing Vocalizations?
The human brain has long been claimed to have specialized neural structures, such as Wernicke’s area, for processing speech and, perhaps, others for interpreting meaning and auditory imagery (Fisher and Marcus 2006), but the notion of areas specialized for speech perception is undergoing some revision. Although primates show evidence of homologous neuroanatomical pathways and structures, a topic of debate is whether the nonhuman primate central auditory system contains regions that are (or, even as a whole, is) specialized for processing vocalizations. First, it is important to distinguish between auditory brain areas being sensitive versus selective. That an area is vocalization sensitive means that its neurons respond especially well to all vocalizations. That an area is vocalization selective means that single or groups of neurons within that area each respond to different vocalizations: some neurons may respond preferentially to contact calls, whereas others may respond to predator warning calls. Based on neurophysiological experiments, authors such as Rauschecker et al. (1995) and Tian et al. (2001) argue convincingly that certain regions of the primate lateral belt may be vocalization selective. However, during the above experiments, responses to vocalizations were not consistently compared with responses to relevant nonvocal complex sounds in the same neurons. Thus, it is possible that neurons in the primate lateral belt are vocalization sensitive but not selective, and such selectivity is not generated until later in higher processing regions.
Many authors have reviewed vocal communication and parallels with human language in primates. In their study on speech segmentation in cotton-top tamarins (Saguinus oedipus), Hauser et al. (2001) demonstrate that nonhuman primates are able to recognize different sequences of syllables in a speech stream. Humans use this ability to calculate statistical probabilities of sequence occurrence (transitional probabilities) for the segmentation and identification of words in an unknown language (e.g., Chomsky 1975). Interestingly, many authors have pointed out that some facets of speech that are central to speech perception in humans, such as syllable onsets, formant frequencies, glottal-pulse periods, and the spectral profiles of consonants and vowels, are already encoded in peripheral hearing not only of primates but of mammals as a whole (e.g., Delgutte 1997; Lieberman 2006).
However, although the mammalian ear may be well-equipped to encode aspects of speech important to human perception, this does not mean that primates are specialized to process the meaning of these features. Many attempts have been made to understand the differences and similarities between human and nonhuman primates with regard to auditory-vocal processing. Because of the complex nature of identified (or as yet unidentified) relationships, Owren and Rendall (2001) rightly warn that, at present, comparisons between (and models of) human language and nonhuman primate vocalizations need to be approached cautiously (also see Ghazanfar and Santos 2004).
3.2.5 Potential Specializations for Processing Species-Specific Vocalizations
Although the communication systems of nonhuman primates do not match humans in either their combinatorial power or the recursive structure of human speech and language, the primate auditory cortex displays similarities with humans, particularly in having a hierarchical structure with tonotopic mapping and specialized streams for processing specific types of information (Rauschecker and Scott 2009). The primate central auditory system shows evidence of specialization for processing location as well as complex bioacoustic communication signals such as conspecific vocalizations. In fact, acoustic sensitivity may decrease when frequencies are not heard in sequences corresponding to biologically meaningful stimuli such as species-specific calls.
The acoustically distinct vocalizations of primate species are well documented and those vocalizations can even be utilized, in some cases, to assess phylogenetic relationships (e.g., Zimmermann 1990). Behavioral studies in the wild provide substantial evidence that primates are able to recognize conspecifics, kin groups, and individuals based on variations in vocal acoustics (e.g., Chapman and Weary 1990). Neurobiological experiments measuring the responses of auditory cortical areas to natural vocalizations versus artificially manipulated or synthesized vocalizations provide a basis for understanding how at least some nonhuman primate species are able to distinguish conspecifics based on their calls. In both human and nonhuman primates, conspecific vocalizations are received in both the left and right cerebral hemispheres, but processing is focused in specific areas of the left hemisphere where some single neurons or groups of neurons may respond particularly well to distinct vocalizations (Ghazanfar and Santos 2004; Poremba et al. 2004). Studies on rhesus macaques demonstrate that the lateral belt systematically represents tones and frequencies and is especially responsive to complex signals such as species-specific vocalizations (e.g., Rauschecker et al. 1995; Romanski et al. 1999). Studies on squirrel monkeys found that neurons in the auditory cortex responded to frequency modulations in both natural and synthesized vocalizations, but responses were greater for natural, strongly amplitude-modulated vocalizations, possibly owing to their syllable-like divisions (Bieser 1998; Ghazanfar and Santos 2004).
In a study on common marmosets, neurons in the primary auditory cortex responded preferentially to normal versus time-reversed, compressed, or expanded conspecific vocalizations. When the same marmoset vocalizations were presented to cats, the evoked responses were relatively small and roughly equal for normal and time-reversed examples (Wang et al. 1995; Wang and Kadia 2001). A behavioral experiment where long calls were played back to cotton-top tamarins found that individuals were more likely to respond to whole rather than parts of conspecific calls (Ghazanfar et al. 2001; Snowdon, Chap. 6). Additional studies indicate that among some animals, neuronal responses to temporally correct combinations of tones are stronger than the summed response to the individual signals presented separately (Viemeister and Wakefield 1991; Alder and Rose 1998). Other studies have shown that, whereas squirrel monkey and cotton-top tamarin auditory cortical areas respond more strongly to conspecific vocalizations than to those of other species, time reversing and pitch shifting did not significantly alter the results, indicating order/spectral insensitivity (e.g., Glass and Wollberg 1983). The preferential processing of and response to species-specific calls may be preprogrammed or dependent on experience and may be related to recognizing signals that are similar to those that are self-produced (Brainard and Doupe 2002). Correctness likely varies at the species level (Alder and Rose 1998).
3.2.6 Interindividual Recognition
Unarguably, humans are able to distinguish between individual voices based on spectral and temporal cues. Two humans saying the same word or phrase (call) can be distinguished from one another. Conversely, tamarin and squirrel monkey studies suggest that the primate auditory system does not respond differently to variants (different examples from different individuals) of the same call (Ghazanfar and Hauser 1999). This suggests that primates may not be universally adept at recognizing individuals based on call structure (Ghazanfar and Santos 2004). However, Wang and colleagues (1995) report that in marmosets, auditory cortical representations from spectrotemporal variants of calls from different individuals were different but overlapping, suggesting some individual recognition might be possible.
Behavioral evidence also supports that primates can recognize individuals from their calls. For example, vervet monkeys (Chlorocebus aethiops) can organize individuals hierarchically and into kin groups based on individual calls (Cheney and Seyfarth 1990), and Waser (1977) provides evidence from playback studies that monkeys can recognize individuals based on their vocalizations. The results of these studies are perhaps not surprising, considering that individual recognition based on call structure has long been reported in birds (e.g., Thorpe 1968). It is completely unknown at present how the primate brain processes and stores these subtle differences.
3.3 Defining, Representing, and Measuring Overall Auditory Sensitivity in Primates
Comparative audiograms for primates have been gathered primarily via traditional behaviorally based testing and physiological techniques such as the auditory brainstem response (ABR) method (Sect. 3.3.3). Currently, data are available for only a small percentage of the hundreds of nonhuman primate species (Sect. 3.4), and much of the existing data is likely to be incomparable due to issues or inconsistencies with experimental design or data reporting, greater than average interindividual variation, unexpected results that do not fit preconceptions about variation in the order, or philosophical debates with regard to potential incompatibilities between behaviorally and physiologically derived data (Coleman 2009; H. E. Heffner and R. S. Heffner 2014). This section introduces the ways in which auditory sensitivity is defined and represented, the conceptual issues surrounding methods of data collection, and the comparability of the resulting data.
3.3.1 Defining and Representing Auditory Sensitivity in Primates
The term auditory sensitivity is utilized throughout this chapter as the broadest definition of the function of the sense—it can be conceived of herein as a representation of all sounds that are collected via the ear, are received (produce a neural response) in the brain, and have the potential of being utilized by the individual. Although the terms auditory sensitivity (audition) and hearing are often used interchangeably, the term hearing carries additional complex meanings related to perception and psychoacoustics.
The auditory sensitivity of primates can be represented as the range of audible frequencies, measured in hertz (Hz), that are detectable at varying amplitudes, measured in decibels (dB re 20 μPa). Frequencies below 20 Hz are defined as infrasound because they are below the range of human hearing, and frequencies above 20 kHz are defined as ultrasound, or above the range of human hearing. Auditory sensitivity can be represented graphically as an audiogram—a curve showing the lowest audible level (threshold, in dB) at each tested frequency. In this chapter, variation in auditory sensitivity within and between species is considered through the most common audiometric parameters: frequency of best sensitivity, defined as the frequency that can be detected at the lowest level (in dB); and the low-frequency and high-frequency limits, defined as the lowest and highest frequencies, respectively, detectable at reasonable amplitudes (conventionally 60 dB). The audible range, defined as the number of octaves between the low- and high-frequency limits, is also a common audiometric parameter, but it is not considered here since it is highly reliant on both the low- and high-frequency limits, and the former is not available for most subjects. Studies have also sought to formulate additional audiometric parameters to facilitate interspecific comparisons, such as the absolute threshold level at particular frequencies, or measures of overall sensitivity across the audiogram, or sensitivity within low-, mid-, and high-frequency areas (e.g., Coleman and Colbert 2010; Ramsier et al. 2012a); these parameters are yet to be widely adopted and thus are not considered further in this chapter.
3.3.2 Determining Threshold
When constructing an audiogram, the precision of the threshold measurement is highly dependant upon the frequency steps used and the accurate calibration of stimuli (Coleman 2009). A free-field speaker is generally considered the ideal transducer for delivering stimuli to primates. The use of headphones, from inserts to circumaural, is also relatively common when testing auditory sensitivity in humans and other animals, as headphones may help minimize interference from subject position, room noise, and electrical artifacts (Martin and Clark 2006). However, earphones that depress or bypass the pinnae may influence or negate the amplification effects of the pinnae (Sinyor and Laszlo 1973; Rosowski 1991). Thus, some workers express concern over the use of headphones, particularly insert varieties, over pinna amplification issues or concerns that delivering low-frequency signals through these devices can be problematic (R. S. Heffner 2004; Coleman 2009). Tables 3.1 and 3.2 show data gathered free-field and with headphones for several species. There seems to be good agreement in the high-frequency limit but more variation with the frequency of best sensitivity, which may be more strongly subject to methodological variations. More data are needed to fully evaluate pinna effects and the influence of transducer type on auditory thresholds. Another potential issue is that pure tone stimuli may only broadly represent auditory sensitivity, given that in at least some primates, neural responses to conspecific vocalizations are enhanced compared to nonspecific noise (Sect. 3.2.5).
Table 3.1
Auditory sensitivity in primate semiorder Strepsirrhini
Species | Method, transducera | Best freq.b (kHz) | High freq.c (kHz) | Low freq.d (Hz) | References |
---|---|---|---|---|---|
Infraorder Lorisiformes | |||||
Bushbaby (Galago senegalensis) | Beh, Spk | 8 | 65.0 | 70 | H. E. Heffner et al. (1969) |
Slow loris (Nycticebus coucang) | ABR, Spk | 16 | 42.6 | – | Ramsier et al. (2012a) |
Beh, Spk | 16 | 43 | 83 | H. E. Heffner and Masterton (1970) | |
Pygmy slow loris (Nycticebus pygmaeus) | ABR, Spk | 11.3 | 51.5 | – | Ramsier et al. (2012a) |
Potto (Perodicticus potto) | Beh, Spk | 16 | 42.0 | 135 | H. E. Heffner and Masterton (1970) |
Infraorder Lemuriformes | |||||
Aye-aye (Daubentonia madagascariensis) | ABR, Spk | 11.3 (4) | 65.6 | – | Ramsier et al. (2012a) |
Crowned lemur (Eulemur coronatus) | ABR, Spk | 8 | 59.6 | – | Ramsier et al. (2012a) |
Collared lemur (Eulemur fulvus collaris) | ABR, Spk | 8 | 57.4 | – | Ramsier et al. (2012a) |
Red-fronted lemur (Eulemur fulvus rufus) | ABR, Spk | 11.3 (5.7) | 63.7 | – | Ramsier et al. (2012a) |
Mongoose lemur (Eulemur mongoz) | ABR, Spk | 8 | 54.2 | – | Ramsier et al. (2012a) |
Red-bellied lemur (Eulemur rubriventer) | ABR, Spk | 5.7 | 45.1 | – | Ramsier et al. (2012a) |
Ring-tailed lemur (Lemur catta) | ABR, Spk | 11.3 (5.7) | 62.2 | – | Ramsier et al. (2012a) |
Beh, Spk | 8 (2) | 58 | 57 | Gillette et al. (1973) | |
Gray mouse lemur (Microcebus murinus) | ABR, Spk | 7.9 | 44.6 | – | Schopf et al. (2014) |
Fork-marked lemur (Phaner furcifer) | Beh, Spk | 16 | 60.0 | 150 | Niaussat and Molin (1978) |
Coquerel’s sifaka (Propithecus coquereli) | ABR, Spk | 11.3 | 49.7 | – | Ramsier et al. (2012a) |
Red-ruffed lemur (Varecia rubra) | ABR, Spk | 11.3 (5.7) | 59.0 | – | Ramsier et al. (2012a) |
Table 3.2
Auditory sensitivity in primate semiorder Haplorhini
Species | Method, transducera | Best freq.b (kHz) | High freq.c (kHz) | Low freq.d (Hz) | References |
---|---|---|---|---|---|
Suborder Tarsiiformes, Infraorder Tarsiiformes | |||||
Philippine Tarsier (Carlito syrichta) | ABR, Spk | 16 (1.4) | 76–91 | – | Ramsier et al. (2012a) |
Suborder Anthropoidea, Infraorder Platyrrhini | |||||
Owl monkey (Aotus trivirgatus) | Beh, Spk | 10 | 44.5 | – | Beecher (1974b) |
Common marmoset (Callithrix jacchus) | Beh, Spk | 7 (2) | 28 | – | Seiden (1957) |
Beh, Spk | 7 | 44.9 | – | Osmanski and Wang (2011) | |
Squirrel monkey (Saimiri sp.) | Beh, Spk | 12 (2) | 42.5 | – | |
Beh, Phn | 8 | 41 | 140 | ||
Suborder Anthropoidea, Infraorder Catarrhini | |||||
Blue monkey (Cercopithecus mitis) | Beh, Spk | 4 (1, 2) | 50.3 | – | Brown and Waser (1984) |
De Brazza’s monkey (Cercopithecus neglectus) | Beh, Phn | 5.7 (1.4) | 43 | 61 | Owren et al. (1988) |
Vervet monkey (Chlorocebus aethiops) | Beh, Phn | 1.4 (5.7) | 45 | 69 | Owren et al. (1988) |
Grey-cheeked mangabey (Lophocebus abligena) | Beh, Spk | 0.8 (8.0) | – | – | Brown (1986) |
Long-tailed macaque (Macaca fascicularis) | Beh, Spk | 16 (1) | – | – | Fugita and Elliott (1965) |
Beh, Phn | 1 (8) | 42 | – | Stebbins et al. (1966) | |
Japanese macaque (Macaca fuscata) | Beh, Spk | 4 (1) | 36.5 | 28 | Jackson et al. (1999) |
Beh, Phn | 5.7 (1–1.4) | 41 | 82 | Owren et al. (1988) | |
Rhesus macaque (Macaca mulatta) | Beh, Spk | 4 (16) | – | – | |
Beh, Phn | 8 (1.4) | 41 | – | ||
ABR, Spk | 16 (4) | 38.1 | – | Lasky et al. (1999) | |
Pig-tailed macaque (Macaca nemestrina) | Beh, Phn | 8 (1) | 35 | – | |
Chimpanzee (Pan troglodytes) | Beh, Phn | 8 (1) | 27 | – | |
Yellow baboon (Papio cynocephalus) | Beh, Spk | 8 (1) | 41.0 | – | Hienz et al. (1982) |
3.3.3 Testing Methods
After decades of refinement, well-designed behavioral testing regimens produce what are generally considered to be ideal estimates of auditory sensitivity, as the behavior of whole animals is measured (H. E. Heffner and R. S. Heffner 2014). Beginning with Elder’s (1934) audiogram for chimpanzees (Pan troglodytes), most existing data on primate audition have been gathered via behaviorally based methodologies, although very few have been collected in recent decades (Sect. 3.4) (Coleman 2009).