Chapter 8 showed how concepts from basic acoustics (Chapter 7) are used to construct a theory of vowel acoustics. The theory can be viewed as a combination of the following concepts: (a) input signals, (b) resonance and resonators, and (c) output signals. As stated in Chapter 8, the basic theory was presented for the case of vowels because the theory is most precise and accurate for this class of sounds. There is, however, an acoustic theory of speech production, not just vowel production. The purpose of this chapter is to establish the theoretical basis for the vocal tract acoustics of non-vowel sounds. Many of the concepts developed for the vowel theory are applicable in this chapter, but some new concepts, specific to the acoustics of consonants, are introduced. The following questions are addressed:
1. Why is the acoustic theory of speech production more accurate for vowels compared with consonants?
2. What are the acoustics of coupled resonators, and how do they apply to consonant acoustics?
3. What is the theory of fricative acoustics?
4. What is the theory of stop acoustics?
5. What is the theory of affricate acoustics?
6. What kinds of acoustic distinctions are associated with the voicing contrast for obstruents (stops, fricatives, and affricates)?
Table 9–1 lists several reasons why the acoustics of sound classes such as obstruents, nasals, and at least one semivowel require some elaboration of the theory outlined for vowels. First, the theory of vowel acoustics is relatively simple because the resonators can be described as extending from the source to lips within a single tube. The single tube, in this case, is the vocal tract, and the source is the vibrating vocal folds. There are sound classes, however, for which the single tube model is not adequate. For example, production of the nasal sounds /m/, /n/, and /ŋ/ involves two major tubes—the pharyngeal-oral and nasal—which communicate with each other.1 This arrangement, wherein a “shunt” or “sidebranch” resonator is attached to a main resonating tube, produces certain acoustic effects different from those associated with vowels.
Second, the important frequencies for vowels, which are below about 4000 Hz, have wavelengths considerably longer than the cross-sectional dimensions of the vocal tract. To make this statement more concrete, consider the cross-sectional areas of the vowel /i/ along the length of the vocal tract. In measurements reported by Fant (1960, p. 115), the cross-sectional areas of the vocal tract for /i/ ranged between 0.65 cm2 (at the site of the front constriction) and 10.5 cm2 (in the region of the relatively open pharynx). The F-pattern for /i/ reported by Fant (1960, p. 109) for a representative speaker is F1 = 240 Hz, F2 = 2250 Hz, and F3 = 3200 Hz. By applying the wavelength formula (λ = c /f) to these frequencies and assuming the speed of sound in air (c) to be 33,600 cm/s, λ1 (wavelength for F1) = 140 cm, λ2 = 14.9 cm, and λ3 = 10.3 cm. Because the range of cross-sectional areas given above are far greater than the simple distances (i.e., radii) used to compute the areas, the wavelengths computed for the first three formants are clearly greater than the cross-sectional dimensions of the vocal tract for the vowel /i/. The importance of this fact is that, in the case of vowels, sound waves travel through the vocal tract primarily as plane waves. In other words, when the wavelengths of frequencies are greater than the cross-sectional dimensions of a tube such as the vocal tract, the pressure waves propagate along the long axis of the tube (from one end to the other), but not in other dimensions (such as from the center to the sides of the tube). When pressure waves are propagated mostly as plane waves, the area function of the tube can be used with great accuracy to predict the resonant frequencies of the tube.
At frequencies above 4000 Hz, many wavelengths are shorter than the cross-sectional dimensions of the vocal tract, and pressure wave propagation in the vocal tract is more complex. Correspondingly, the mathematics underlying the theory for cases in which wavelengths are smaller than the cross-sectional dimensions of the vocal tract are also more complex, and more prone to error. Many consonants—especially obstruents, which include stops, fricatives, and affricates—have substantial amounts of energy above 4000 Hz, so the theory is not as accurate for this class of sounds compared with vowels.
Finally, the theory of vowel acoustics includes a complex periodic source, described in Chapter 8. The spectrum of the voicing source is related in a straightforward way to the complex, periodic motions of the vocal folds. In obstruent consonants, however, many sources are aperiodic and depend on complex interactions between airflow and structures within the vocal tract. In addition, some sources for obstruents are located between resonant chambers of the vocal tract, rather than at one end of the vocal tract as in the case of vowels.
The first section of this chapter presents theoretical concepts required to understand coupled or “shunt” resonators. The next section discusses vocal tract aeromechanics in obstruent production, and their relationship to concepts from vowel acoustics. Subsequent sections cover points 2 and 3 in Table 9–1 and show how the acoustics of stops, fricatives, and affricates are logical consequences of aeromechanical events associated with the articulatory positions, configurations, and movements in obstruent production.
The English nasals /m/, /n/, and /ŋ/ are produced with oral airway closure and an open velopharyngeal port. Because the oral closure involves a complete obstruction to airflow through the vocal tract, /m/, /n/, and /ŋ/ are sometimes called “nasal stop consonants.” In fact, the place of complete closure for the three nasals is the same as the place of closure for the stops /b/, /d/, and /g/ (and cognates /p/, /t/, and /k/). Nasals are, therefore, like stop consonants produced with an open, rather than closed, velopharyngeal port. The interval during which the oral closure coincides with an open velopharyngeal port is referred to as the nasal murmur. This term is used to distinguish the acoustics of nasals produced with complete oral closure (the nasal murmur) from the acoustics of vowels produced with a somewhat open velopharyngeal port—that is, nasalized vowels. The theory of nasal murmurs is discussed first, followed by the more complex case of nasalization, the term used to describe the acoustic effect of coupled resonators with an open oral tract.
Figure 9–1 shows a schematic tube model of the vocal and nasal tracts during production of an /m/. The nasal tract part of this model is highly simplified and schematic for the purposes of this discussion; for beautiful computerized tomography (CT) and magnetic resonance images (MRI) of a human nasal tract, see Serrurier and Badin (2008), their Figure 6.
Figure 9–1. Tube model of the vocal tract with coupled resonators. The open velopharyngeal port couples the nasal and pharyngeal-oral cavities, and the sinus cavities are coupled to the nasal cavities. The front end of the tube, where the lips are located, is closed.
Several features of the model in Figure 9–1 are different from the vowel tube model discussed in Chapter 8. First, the lip end of the pharyngeal-oral tube is closed, consistent with labial closure for /m/. The rest of the pharyngeal-oral tube has been constricted roughly in the shape appropriate for the vowel /i/ (tight constriction in the front of the vocal tract, more open tube in the back). Second, the velopharyngeal port is open, also consistent with the production of /m/ or any other nasal sound. As shown in Figure 9–1, the open velopharyngeal port couples the pharyngeal-oral and nasal tracts. Tubes coupled in this way are referred to as shunt resonators, one of the resonators being a shunt, or diverging tube, relative to the other resonator. Third, additional shunt resonators are coupled to the nasal tract, shown as tubelettes communicating with the nasal cavities. These tubelettes represent the sinuses, which contribute in important ways to the acoustics of nasal sounds (Dang & Honda, 1996; Dang, Honda, & Suzuki, 1994). The source is located at the glottal end of the tube and has a harmonic spectrum produced by the vibrating vocal folds, just as in the production of vowels. All English nasals are produced as voiced sounds.
As in the case of vowels, the resonators shown in Figure 9–1 shape the spectrum of the source. When shaping the source spectrum for vowels, there are frequency regions at which sound transmission through the vocal tract is maximum. Those regions appear in both the theoretical (i.e., computed from the mathematical theory) and measured spectra as peaks, otherwise known as resonances or formants. The frequency regions between these spectral peaks, or spectral valleys, have substantially less energy than the resonances because the vocal tract shape does not emphasize energy in these regions. Although the reasoning in this last statement may sound circular, it serves to highlight the primary difference between theoretical and measured spectra of vowels on the one hand, and nasals on the other. In the case of the coupled (shunt) resonators shown in Figure 9–1, there are frequency regions where sound energy hits a kind of acoustic dead end, and is “trapped,” thus producing antiresonances. For example, the closed oral tube shown in Figure 9–1 shapes a frequency region of the source spectrum, but that energy is “trapped” in the closed resonator. Energy may also be trapped in the smaller sinus resonators shown in Figure 9–1, because the sinus cavities are closed resonators. These regions of antiresonance, where energy is trapped because two or more tubes are coupled together and one or more of the tubes has a dead end, can be calculated based on resonator type and size, just as in the vowel theory. An antiresonance affects a measured spectrum in several ways, most notably by eliminating or reducing energy in its vicinity.
Because the nose is an acoustic tube open to the atmosphere at the nares, nasal murmurs have resonances (formants) related to the shape and size of the nasal passages. Antiresonances originate in the sinus cavities and the closed oral cavity. The nasal cavities extend from just behind the nasal septum to the outlet of the nasal cavities at the nares (Pruthi, Espy-Wilson, & Story, 2007). Nasal sounds, therefore, have spectra consisting of a mix of resonances and antiresonances. The concept of an antiresonance is illustrated in Figure 9–2, which shows theoretical speech-sound spectra from Fant (1960) for a vowel (/ɑ/) and two nasals (/m/ and /n/). The theoretical spectrum for the vowel /ɑ/ was computed, as discussed in Chapter 8, from the estimated area function derived from a midsagittal x-ray tracing of a single speaker producing the sustained vowel. The theoretical spectra for the nasals were also estimated from area functions of the nasal cavities, but the process was somewhat more complex than the case for vowels. Sagittal x-rays could not produce a satisfactory image for the computation of nasal area functions, so Fant used a plastic model of the nasal cavities obtained from a cadaver, and adjusted this model to fit the dimensions of the single (live) speaker. The nasal area functions derived from this model were submitted to the mathematical theory that generated the nasal resonances and antiresonances resulting from the coupling of the nasal and pharyngeal-oral cavities. It should be noted that modern imaging techniques allow very accurate reconstructions of nasal and sinus cavity dimensions and, therefore, estimates of nasal cavity area functions (Dang & Honda, 1996; Dang et al., 1994; Serrurier & Badin, 2008).
The first three peaks in the theoretical vowel spectrum are shown clearly in Figure 9–2A. These peaks correspond to the first three formants of the vowel /ɑ/. Note how the peaks in this spectrum are sharply tuned, with relatively narrow bandwidths. Of special interest are the valleys of the computed resonance curve. The arrow on the vowel spectrum indicates a valley around 1800 Hz where the energy is nearly 40 dB less than the energy of the first peak. This is a substantial energy difference between the highest peak in the spectrum and the valley, but note that the valley develops between the second and third peaks in a relatively gradual way. Stated otherwise, the vowel spectrum does not show a sharply tuned, “upside-down” peak along its resonance curve. The valley in the vowel spectrum is partly the result of the decreasing energy in the source spectrum with increasing frequency (see Chapter 8), and partly the result of the close spacing of F1 and F2 for this vowel (see Fant, 1973, Chapter 1, for information on formant frequency spacing and formant intensity).
The concept of shunt resonators is well known to engineers interested in noise reduction in cars, motorcycles, and heating systems. A shunt resonator traps energy at certain frequencies and reduces the amount of energy radiating (coming out) from an acoustic system. This is why car mufflers are constructed with multiple side branches off the main pipe, and heating ducts often have small, dead-end chambers off their main path. Although nasals don’t “require” sound reduction, listeners seem to take advantage of it and use it as one cue that a nasal has been produced.
Figure 9–2. Computed spectra based on area functions, showing resonances for /ɑ/ (panel A) and resonances and antiresonances for two nasal murmur spectra /m/ (panel B) and /n/ (panel C). From Acoustic Theory of Speech Production (pp. 144, 153), by G. Fant, 1960, The Hague: Mouton. Copyright 1960 by Mouton de Gruyter. Modified and reproduced with permission.
In a beautifully done experiment, Dang and Honda (1996) used MRI to measure the structural characteristics of the sphenoid, maxillary, and frontal sinuses in three participants. These characteristics included the volume of each sinus, as well as the dimensions of the small anatomical “tube” connecting the sinus cavity to the main nasal pathways. For each sinus volume, Dang and Honda computed a value for compliance, and for each connecting tube a value for inertance. They entered these values into the formula for Helmholtz resonators (see Chapter 7) and obtained the theoretical location of antiresonance frequencies for each sinus (remember: the sinuses are closed cavities). Then they compared the calculated antiresonance frequencies with actual measurements of antiresonances during the participants’ production of nasals. The calculated and measured antiresonance frequencies matched within 10% of each other!
The theoretical spectrum for /m/ shows peaks, much like the vowel spectrum, but it also shows a reverse, or upside-down, peak. This sharply tuned reverse peak, which is indicated in Figure 9–2B by an arrow, occurs around 800 Hz and is the antiresonance that results from trapped acoustic energy in the closed pharyngeal-oral cavity. The antiresonance peak shows relatively sharp tuning, which distinguishes it from the broader valleys seen in vowel spectra. Fant (1960) showed that the frequency of this antiresonance is related to the size of the pharyngeal-oral cavity in which the acoustic energy is trapped. In other words, the frequency location of these reverse (negative) peaks is based on the same resonator rules as the frequency location of the positive peaks referred to as formants. Larger cavities yield antiresonances with lower frequencies; smaller cavities yield antiresonances with higher frequencies.
This latter point is made clear by examination of the theoretical spectrum for /n/ shown in Figure 9–2C. The reverse peak in this spectrum—the antiresonance—is located just below 2000 Hz (see arrow), which is substantially higher than the 800 Hz antiresonance for /m/. Based on the relation of resonator size to frequency, the different frequency locations for the antiresonances of /m/ versus /n/ make sense. The cavity in which acoustic energy is trapped for /m/ is substantially larger than the cavity in which energy is trapped for /n/. The reasoning extends to the case of the velar nasal /ŋ/, for which the point of constriction may result in an extremely small cavity behind the constriction made by the tongue dorsum against the posterior end of the hard palate or anterior end of the soft palate. When the coupled cavity is extremely small, the antiresonance associated with the oral cavity is located at a relatively high frequency, probably well above 3000 Hz. The absence of a coupled cavity, and hence a cavity in which acoustic energy can be trapped, occurs if the point of constriction for the velar nasal is sufficiently posterior in the vocal tract so that the pharyngeal and nasal cavities appear as a single tube with no side branches. In this case, there may be no oral antiresonance, but a pattern of resonances determined by the configuration of the continuous pharyngeal-nasal tube.2 Midsagittal tracings of vocal tract configurations for /m/, /n/, and relatively anterior and posterior constrictions for /ŋ/, adapted from Fant (1960), are shown in Figure 9–3 to summarize these concepts of sidebranch resonance and coupled cavity size. Dashed outlines show the size of the closed pharyngeal-oral cavity in which energy is trapped. The absence of a cavity indicated by a dashed outline, in the right-hand vocal tract of Figure 9–3, is the case of an extremely posterior point of constriction for /ŋ/.
Although this discussion has focused on the antiresonance feature of the nasal murmur spectra shown in Figure 9–2, nasal murmurs have prominent resonances. In Figures 9–2B and 9–2C, the first peak in the nasal murmur spectra for both /m/ and /n/ occurs roughly between 250 Hz and 300 Hz and has a relatively high amplitude. According to Fant (1960), the first resonance of the velar nasal /ŋ/ also occurs in this frequency region. This relatively high-amplitude, low-frequency (250–300 Hz) formant can be considered a constant feature of all nasals. The high amplitude of the first nasal resonance is greater than the amplitudes of higher-frequency nasal resonances, but typically of lesser amplitude compared with formant amplitudes of vowels preceding or following a nasal murmur, as in a vowel-consonant-vowel (VCV) sequence in which C is a nasal murmur (see Chapter 11). The constancy across nasals of the first nasal resonance, or F1 of the nasal murmur, suggests that it originates in cavities of the same size for all murmurs. The combined pharyngeal and nasal cavities are the constant resonators across nasal murmurs. In people with a structurally intact speech production apparatus, there are typically no time-varying constrictions of the nasal cavities—the nasal tract does not change shape during speech production, as does the oral tract. Similarly, the shape of the pharyngeal section of the vocal tract is fairly constant across different places of articulation for nasal murmurs. The effect of this constant tube shape of the pharyngeal-nasal cavities is the production of a low-frequency formant that is an acoustic “signature” of a nasal murmur. In addition to this low F1, nasal murmurs have a series of higher formants occurring roughly at 1000 Hz (nasal F2), 2000 Hz (nasal F3), 3000 Hz (nasal F4), and so on. These upper formants of the nasal murmur, however, tend to be quite sensitive to surrounding phonetic context and are variable across speakers (Fujimura, 1962).
Figure 9–3. Midsagittal tracings of the vocal tract for /m/, /n/, and /ŋ/. The two right-hand vocal tracts show a relatively anterior and a relatively posterior constriction for /ŋ/, respectively. The size of the coupled oral cavity in the first three vocal tracts is enclosed by the dashed lines. The size of this cavity is largest for /m/, somewhat smaller for /n/, and very small for /ŋ/ when the constriction is anterior. There is no dashed line in the rightmost vocal tract because the extreme posterior point of constriction does not permit coupling between the oral and nasal cavities. In this case, the pharynx and nasal cavities act as a continuous, single-tube resonator. From Acoustic Theory of Speech Production (p. 140), by G. Fant, 1960, The Hague: Mouton. Copyright 1960 by Mouton de Gruyter. Modified and reproduced with permission.
The resonances generated by the nasal tract tend to have greater bandwidths than the resonances of a typical vowel spectrum. In Chapter 8, the factors of absorption, friction, radiation, and gravity were identified as sources of energy loss in the transmission of sound through the vocal tract and into the atmosphere. The wider bandwidths of nasal resonances are largely accounted for by high absorption factors in the nasal cavities. These high absorption factors are related to the extensive surface area within the nasal cavities, which contain many complicated folds and recesses. There is more tissue to absorb sound in the nasal cavities than there is in the vocal tract. Because of this, the damping of nasal resonances, as revealed by the bandwidths of nasal formants, tends to be greater than the damping of oral (vowel) resonances.
The combined effects of an antiresonance and the relatively greater damping of nasal resonances tend to make nasal murmurs weak in relative amplitude compared with vowels (Pruthi & Espy-Wilson, 2004). Antiresonances not only eliminate energy at the exact frequency location of their “reversed peaks,” but also reduce energy at surrounding frequencies. Increased damping results in less intense resonant peaks. Because the overall amplitude of a sound can be thought of as the sum of all energy along the resonance curve (i.e., the sum of all amplitudes at all frequencies in a spectrum), speech sounds with antiresonances and increased damping, such as nasals, naturally have less overall amplitude than sounds such as vowels that do not have antiresonances and have minimal damping.
A nasal murmur is defined as the sound produced when the velopharyngeal port is open and there is a complete obstruction to the oral airstream. This description encompasses the sound class of English nasal consonants transcribed as /m/, /n/, and /ŋ/. In the production of nasal murmurs, two major tubes are coupled: the nasal tube, which is open to the atmosphere, and the oral tube, which is completely sealed from the atmosphere by articulatory closure. The sealed oral tube shapes the source spectrum according to its resonator size, but the energy shaped by that tube is trapped because its outlet to the atmosphere is closed. This results in an antiresonance, or a reverse peak in the spectrum. The sinus cavities coupled to the nasal tract also function as closed resonators and contribute antiresonances to the spectrum of a nasal murmur. Antiresonances affect a measured spectrum by reducing or eliminating energy at and around the frequency of the reversed peak. The energy in the spectrum of a nasal murmur is also reduced because of the relatively high damping of nasal formants that results from absorption of sound by the extensive tissue surface area in the nasal cavities. Whereas antiresonances are an important characteristic of nasal murmurs, there are also resonances of the nasal cavities, the most important of which is a low-frequency formant between 250 and 300 Hz. This formant frequency is relatively constant for the three nasals of English, because the pharyngeal and nasal cavities responsible for the formant do not change shape across different places of articulation. Higher formants for nasal murmurs can be measured as well, with F2 at approximately 1000 Hz, and upper nasal formants (F3, F4, etc.) at 1000 Hz intervals (F2 = 2000 Hz, F3 = 3000 Hz, and so forth). These formants reflect resonances of the nasal cavities.
The acoustic theory of nasalization has many similarities to the theory of nasal murmurs but is somewhat more complicated. As in nasal murmurs, the pharyngeal-oral and nasal airways are coupled, but in the case of nasalization both tracts are open to atmosphere. The overall output of the vocal tract for nasalized vowels represents a mixture of the resonant characteristics of the nasal and pharyngeal-oral cavities, as well as the effects of their coupling.
When the nasal and pharyngeal-oral airways are coupled, with both open to atmosphere, sound waves propagate through both airways and radiate from the mouth and nares. Each of these tracts has resonant characteristics dependent on the size of the cavities (and, hence, the inertance and compliance of the air in those cavities). The oral resonances are roughly, but not exactly (see below), the same as when the nasal cavities are not coupled to the pharyngeal-oral cavities. Thus, one major effect of coupling the nasal to the pharyngeal-oral cavity during vowel production is the addition of resonances from the nasal cavities to resonances of the vocal tract.
Another effect of coupling the oral and nasal cavities is the addition of antiresonances into the spectrum. In nasalized vowels, antiresonances are the result of energy trapped in the paranasal sinuses (Dang et al., 1994; Havel, Hoffman, Mürbe, & Sundberg, 2014; Stevens, Fant, & Hawkins, 1987). The main antiresonance in nasalized vowels occurs at a relatively low frequency, between 300 Hz and 1000 Hz. Interestingly, the primary resonance of the nasal cavities also occurs in this frequency region, as does the first formant (F1) of most non-nasal vowels. Nasalized vowels, therefore, contain a low-frequency spectrum (~300–1000 Hz) having a nasal resonance, an antiresonance, and a pharyngeal-oral resonance (the F1 of the oral vowel). For several writers (Hawkins & Stevens, 1985; Stevens et al., 1987), this resonance-antiresonance-resonance pattern in the region around F1 of the oral vowel is the defining acoustic feature of nasalization.
The low-frequency spectra of four nasalized and non-nasalized vowels are shown in Figure 9–4. Each panel shows the energy level, in relative amplitude (dB), across a frequency range of 0 Hz to 1300 Hz. The “zero” point on the dB scale is arbitrary. The blue curve in each panel shows the spectrum for the non-nasalized vowel and the red curve shows the spectrum for the nasalized version of the vowel. The articulatory difference between non-nasalized and nasalized vowels is the configuration of the velopharyngeal port. For non-nasalized vowels, the port is closed, whereas for nasalized vowels, the port is open.3 For the vowel /ɑ/ in Figure 9–4 there are peaks in the non-nasalized spectrum (blue curve) at roughly 680 Hz and 1100 Hz, and the absence of sharply tuned reversed peaks (antiresonances). The frequency location of the peaks is consistent with the typical F1 and F2 values observed in /ɑ/ spectra of adult males. The label F1o indicates the first oral resonance (i.e., the first formant) of the non-nasalized vowel. The spectrum for the nasalized /ɑ/ (that is, /ᾶ/: red curve) shows oral peaks in roughly the same location as the F1o and F2 peaks of the non-nasalized spectrum, but the nasalized peaks are at somewhat higher frequencies than the non-nasalized peaks; this is especially the case for compared with F1o. Note the label indicating the location of the first oral resonance for /ᾶ/.
Figure 9–4. Spectra for the vowels /ɑ/, /ε/, /u/, and /i/ for non-nasalized (blue curves) and nasalized (red curves) productions. Frequency is plotted between 0 and 1300 Hz on the x-axis and relative amplitude, in dB, is plotted on the ordinate. NR = nasal resonance. AR = antiresonance. F1o = F1 of non-nasalized vowel. of nasalized vowel. For each vowel except /i/, there is a nasal resonance-antiresonance-F1 pattern in the nasalized spectra. In the case of /i/, the nasal resonance is canceled by the antiresonance because of the small coupling (small velopharyngeal port opening) between the oral and nasal cavities. From “Some acoustical and perceptual correlates of nasal vowels,” by K. Stevens, G. Fant, and S. Hawkins in In Honor of Ilse Lehiste (p. 246), edited by R. Channon and L. Shockey, 1987, Dordrecht, Netherlands: Foris. Copyright 1987 by Foris. Modified and reproduced with permission.
The /ᾶ/ spectrum is different from the /ɑ/ (non-nasalized) spectrum in several ways. First, has lower amplitude than F1o. There is also an amplitude difference for the two F2 peaks (around 1100 Hz), but not of the same magnitude. Second, there is an antiresonance, labeled AR on the graph, just above 500 Hz in the /ᾶ/ spectrum. This antiresonance is a result of the coupling of the sinus cavities to the pharyngeal-oral and nasal cavities. The antiresonance also accounts for the relatively low amplitude of compared with F1o. Recall that antiresonances reduce energy at and around their frequency locations. The location of the antiresonance, just above 500 Hz, is close enough to the F1 of /ᾶ/ to have a substantial effect on its amplitude. Third, there is a low-amplitude peak in the nasalized spectrum, around 400 Hz, that is not present in the non-nasalized spectrum. This is the primary resonance of the nasal tract, labeled NR. This nasal resonance mixes with the oral resonances and is seen clearly in the output spectrum of a nasalized vowel.
The resonance-antiresonance-resonance pattern in /ᾶ/, therefore, consists of the nasal tract resonance (NR) around 400 Hz, the antiresonance (AR) just above 500 Hz, and the F1 around 700 Hz. This general pattern is seen for other nasalized vowels, with variations that depend on vowel identity. Note for the vowels /ɑ/, /ɛ̃/, and /ũ/ how is shifted up in frequency relative to F1o, as well as the lower amplitude of compared with F1o. In each of these cases, the antiresonance resulting from the coupling of sinus cavities to the pharyngeal-oral and nasal cavities reduces the amplitude of the first oral resonance. Also noteworthy is the consistent frequency of the nasal resonance (NR) for the vowels /ᾶ/, /ɛ̃/, and /ũ/. This consistency has the same explanation as the consistent nasal resonance in nasal murmurs. The area function of the pharyngo-nasal tract does not change substantially during speech production, and therefore neither do its resonances.
The non-nasalized and nasalized spectra shown in Figure 9–4 for /i/ seem to violate these acoustic principles of nasalization. No antiresonance (AR) is indicated, nor is there a nasal resonance (NR). The absence of the resonance-antiresonance-resonance pattern in the /ĩ/ spectrum can be attributed to the small amount of coupling between the pharyngeal-oral and nasal cavities for this vowel. It is well known that the size of the velopharyngeal port in nasalized vowels is greater for low and mid-vowels (such as /ɑ/ and /ε/) than it is for the high vowel /i/ (Bell-Berti, 1993). Stated in acoustic terms, the coupling between the pharyngeal-oral and nasal cavities is greater for low and mid-vowels than it is for high vowels such as /i/. When the coupling between the pharyngeal-oral and nasal cavities is small, the nasal resonance (NR) and the antiresonance (AR) have essentially the same frequency. A resonance and antiresonance associated with the same cavity and therefore the same resonant frequency but with different signs cancel each other and their effects are not seen in the output spectrum. The absence of a nasal resonance and antiresonance in the /ĩ/ spectrum is a result of this cancellation. The lack of effect of an antiresonance on F1 for /ĩ/ is apparent in the nearly identical amplitudes of F1o and .
Yes, the locations and shapes of the sinus cavities are fixed inside your head and yes, they are relatively far from the action where articulatory changes modify the area function of the vocal tract. The sinuses are even relatively far from the ever-changing area of the velopharyngeal port. It might be assumed, then, that for a given person the (anti)resonant frequencies of these sinus resonators contribute in a fixed way to the spectrum of a nasalized vowel: their acoustic consequences would seem to be above the fray of the acoustic consequences of moving the tongue, jaw, velum, and so forth. Not so, according to Pruthi, Espy-Wilson, and Story (2007), who combined MRI measurements of the vocal and nasal tracts with electrical modeling analysis to show that for nasalized vowels, the precise frequencies of antiresonances originating in the sinus cavities change with changes in the vocal tract area function, as well as with the degree of openness at the velopharyngeal port! Interestingly, sinus (anti)resonant frequencies are not affected by different places of articulation for nasal murmurs—they are constant. But the hypernasality of dissatisfaction is basically a vowel thing, so the next time you are asked to stop complaining, nod knowingly and state that whining is acoustically complex, as are you.
More to Vowels Than Meets the Ear
How are vowels distinguished in a language? Well, according to tongue height, tongue advancement, and lip configuration, right? Actually, a more precise answer is yes but not absolutely. In their 1996 book, The Sounds of the World’s Languages, the great phoneticians Peter Ladefoged (1925–2006) and Ian Maddieson described and discussed “minor features of vowel quality,” meaning vowel differences involving contrasts of (for example) voice quality and nasalization. Among the several “minor” contrasts in vowel quality discussed by Ladefoged and Maddieson, the opposition between an oral vowel and its nasalized counterpart (/i/ vs. /ĩ/, for example, as in French or Portuguese) is said to be the most common among languages of the world.
The specific acoustic characteristics of nasalized vowels depend very much on precisely which vowel is nasalized. In a study employing a computer-implemented, electrical model of the vocal and nasal tracts plus the sinuses, “equivalent” coupling of the oral and nasal tracts for high and low vowels (that is, the same area of opening of the velopharyngeal port for high and low vowels) produced very different changes in the low-frequency spectrum of the respective vowels (Rong & Kuehn, 2010). This indicates an acoustic interaction between the size of the velopharyngeal port and vocal tract configuration for a vowel. An appropriate response to the question of how an open velopharyngeal port changes the formant and antiresonance characteristics of the nasalized vowel spectrum requires a question in return: Which vowel are you asking about? (A similar result was reported by Pruthi, Espy-Wilson, & Story, 2007.)
Nasalization is the term used to describe the production of vowels with an open velopharyngeal port. The acoustic theory of nasalization differs from the theory of nasal murmurs because in the former the oral airway is open, whereas in the latter the oral airway is closed. The primary acoustic effects of nasalization are seen in the frequency range between 0 and 1000 Hz and include: (a) the introduction of an “extra” resonance from the nasal tract, usually in the 300 to 500 Hz region; (b) an antiresonance located at a slightly higher frequency than the nasal tract resonance, probably due to trapping of energy in the paranasal sinus cavities, which act as sidebranch resonators to the main nasal cavities; and (c) a first oral resonance (), which may be slightly higher in frequency and of lesser amplitude than the non-nasalized F1 (F1o). The reduction of amplitude is due to the nearby antiresonance. Because of the reduction in amplitude, nasalized vowels typically have less overall amplitude than corresponding non-nasalized vowels.
Why is nasalization important? (see Stevens et al., 1987). First, when English vowels are articulated either before or after nasals, some portion of the vowel is produced with an open velopharyngeal port. Even though the vowels of English are described as non-nasal and, therefore, are specified as having a closed velopharyngeal port, the open velopharyngeal port required for the nasal consonant will “spread” an acoustic effect to adjacent vowels. This spreading of articulatory (and, therefore, acoustic) characteristics from one segment to another is called coarticulation, as discussed in Chapter 5. The open velopharyngeal port for the nasal murmur cannot be closed instantaneously for the articulation of a following vowel. Thus, a vowel following a nasal consonant is nasalized for a brief time, during which the output of the vocal tract reflects the effects of combined oral and nasal acoustics. Similarly, a vowel preceding a nasal consonant is nasalized for a certain interval when the velopharyngeal port is opened prior to the oral articulation of the nasal murmur. This opening of the velopharyngeal port during the vowel is often thought to reflect anticipation of the articulatory requirements of the nasal murmur. Even though there is no contrast in English between nasalized and non-nasalized vowels, these coarticulatory effects may serve as important cues to the phonetic perception of an upcoming nasal consonant.
The second reason for understanding nasalization is suggested by the closing statement of the preceding paragraph. Whereas English does not have a phonem ic opposition for nasalized and non-nasalized vowels, such contrasts are phonemic in languages such as French and Hindi (a language spoken in northern India). A comprehensive theory of speech acoustics should be able to explain the acoustic basis of sound systems for many (if not all) languages of the world (see Sidetrack, “More to Vowels Than Meets the Ear”).
A third reason for considering an acoustic theory of nasalization is the more practical one of children and adults with structural or neurological disorders that prevent the decoupling of the pharyngeal-oral and nasal cavities in speech production. Craniofacial anomalies, seen in many different syndromes, often involve structural deficits of the velopharyngeal port area which make velopharyngeal closure inadequate or impossible. Many neurological diseases cause dysarthria, which affects the ability of muscles of the speech apparatus to function properly. A typical sign in many cases of dysarthria is chronic or intermittent hypernasality, sometimes similar to that seen in craniofacial anomalies. Regardless of the cause of inadequate or absent velopharyngeal closure, the effect is the same: the chronic or intermittent nasalization of vowels (as well as additional effects on consonant production and acoustics). Speech-language pathologists should know the theory of nasalization as part of their basic scientific knowledge and as a foundation for diagnostic, prognostic, and management plans and statements. A speech-language pathologist who understands the acoustics of nasalization is able to provide a coherent account to other health care professionals, such as physicians, as to why the speech of a child with a repaired cleft palate but lingering velopharyngeal inadequacy produces “muffled” and soft speech. As a further example of how speech acoustics theory can inform clinical practice, recent theoretical work of Rong and Kuehn (2010) has shown how the acoustic results of nasalizing a vowel may be compensated by adjustments of the oral cavity. Rong and Kuehn (2010) used modeling techniques—very similar to the techniques used by Stevens and House, described in Chapter 8, except with the electrical circuits implemented by computer software—to show that nasal formants and antiresonances may be minimized or even eliminated by proper adjustments of the oral articulators. Rong and Kuehn’s work suggests the possibility of speech-language pathologists teaching a child to modify oral postures to offset chronic or intermittent nasalization effects.
Figure 9–5A shows a tracing from a midsagittal x-ray of a speaker producing an /l/. Note the contact of the tongue apex at the alveolar ridge (arrow). Behind this contact, MRI data have shown the front part of the tongue to be somewhat grooved and the tongue dorsum raised very close to the soft palate. This tongue configuration has two parallel air passageways, one on either side of the midline of the vocal tract (Narayanan, Alwan, & Haker, 1997). This type of /l/ production is referred to as a lateral manner of articulation, to denote the articulatory configuration just described and hence the propagation of sound waves through the lateral passageways.4 The cavity immediately behind the apical closure, however, traps sound energy at frequencies determined by the size of the closed resonator and therefore introduces an antiresonance into the /l/ spectrum. The cavity behind the apical closure can be considered a shunt resonator, in much the same way as described above for nasals.
The antiresonance in lateralized /l/ spectra derives from the cavity extending from the apical closure to the uvula, as indicated in the x-ray tracing of Figure 9–5A by the horizontal line ending in short vertical bars. Figure 9–5B shows the computed (theoretical) and measured spectra for an articulatory configuration like that shown in Figure 9–5A. If the spectra shown by the solid (theoretical) and dashed (measured) lines are compared, both show roughly the same frequency location of the antiresonance (AR), or reverse peak. This antiresonance occurs between 1800 and 2000 Hz and produces a substantial “dip” in the spectrum between the second and third formants. The antiresonance probably has the greatest effect on the amplitude of F3, which is quite low in both the theoretical and measured /l/ spectra in Figure 9–5B.
Shunt resonators also occur in obstruent production. For all sounds considered so far (vowels, nasals, laterals), the theory involves a voicing source located at the back (glottis) end of the vocal tract tube. The production of obstruents, however, involves a source of sound located between two resonating cavities. For example, in the production of /ʃ/ there is a noise (aperiodic) source generated near the supraglottal constriction. An MRI of a speaker producing an /ʃ/ is shown in Figure 9–6. The lips are to the right of the image and the air-filled cavities are shown as illuminated passageways. The /ʃ/ constriction is indicated by an arrow, and in and near this constriction a source of frication energy is generated (see below for more details on frication sources). Just as the vibrating vocal folds produce a spectrum shaped by the vocal tract resonators, the /ʃ/ noise source has its spectrum shaped by the vocal tract. Even though the /ʃ/ noise source sits roughly between two resonators—the cavity in front of the constriction (Figure 9–6, “front cavity”) and the cavity behind the constriction (Figure 9–6, “back cavity”)—both cavities contribute to the vocal tract output because sound waves propagate away from the source in both directions (forward and backward). Because the back cavity is effectively closed, it traps energy at frequencies determined by its size and generates an antiresonance. The back cavity acts as a coupled or shunt resonator in the production of /ʃ/, and its antiresonance has an influence on the shape of the output spectrum. These kinds of coupled or shunt resonators are seen in the production of fricatives, stops, and affricates, as discussed more fully in the next section.
Figure 9–5. A. Midsagittal x-ray tracing of a lateral /l/ production. The contact of the tongue apex to the alveolar ridge is indicated by an arrow. The cavity where energy is trapped, thus introducing an antiresonance into the /l/ spectrum, is indicated by the horizontal line ending in short vertical bars. B. Theoretical (solid line) and measured (dotted line) spectra for lateralized /l/. Note the antiresonance (AR) in the region 1800 to 2000 Hz in both the theoretical and measured spectra. F1, F2, and F3 are indicated on the theoretical spectrum. From Acoustic Theory of Speech Production (pp. 163, 165), by G. Fant, The Hague: Mouton. Copyright 1960 by Mouton de Gruyter. Modified and reproduced with permission.