Sound production of all types requires three basic components: a power source, a vibratory source, and a resonator (Fig. 72.4
). In human voice production, the exhaled air from the lungs serves as the power source that drives vocal fold vibration, the vibratory source. The true vocal folds produce a faint primary sounds source, which is then modulated by the third component, the resonating chamber or the supraglottic vocal tract. The supraglottic vocal tract is formed by the supraglottic larynx, the pharynx, and the oral cavity. If no resonating chamber were present, the primary sound source would produce a buzzing-type sound similar to a faint duck call.
Human communication can be divided into language, speech, and voice. Specifically, language production requires cognitive skills regulated by the cerebral cortex. Patients integrate higher cortical functions to express themselves through words and actions familiar to a certain region in which they were educated. Speech production refers to the articulation of words to produce language. It involves the supraglottic vocal tract for resonation and articulation of sounds produced at the glottis level. It is under the control of the cerebral cortex but is regulated by the coordinating centers in the basal ganglia and brain stem (3
). Voice is produced through vocal fold vibration and specifically refers to the sound that emanates from the vocal folds as they are held in approximation and passively vibrated by the air that flows between them. Sounds produced by vocal fold vibration are termed voiced sounds
. Commonly these are the vowels and voice consonants that require vocal fold adduction. Adduction and tension are regulated by cerebral cortical activity and are coordinated through the basal ganglia as well.
Harmonic Sound Source
Patients can present with problems of language, speech, voice, or a combination of these. In the professional voice user, we are most often concerned with problems of voice
production. Vocal fold vibratory activity is critical for voice production, as it provides the primary sound source, which is modulated by actions of the vocal tract (4
). Vocal fold vibrations produce a complex tone. That is, they vibrate at a set of frequencies that have a whole-number mathematic relation between them. The primary frequency of vibration is termed the fundamental frequency
), and is closely related to the perceived pitch of the voice. Each wholenumber multiple of the F0
is called an overtone. The terminology is such that the F0
is labeled the first harmonic (H1
), and each multiple of that frequency is the second, third, fourth, fifth harmonic (H2
, etc.), and so on to infinity (Table 72.1
TABLE 72.1 EXAMPLE OF FUNDAMENTAL FREQUENCY, OVERTONES, AND HARMONICS
n = 1
First Harmonic (H1) or F0
n = 2
Second Harmonic (H2)
n = 3
Third Harmonic (H3)
n = 4
Fourth Harmonic (H4)
The harmonic spectrum is presented to the vocal tract. Because of the length, shape, and distal opening of the vocal tract, certain of these harmonics are amplified or resonated, and others are dampened or attenuated (4
). This pattern, known as the spectral envelope
, creates the sounds that we hear in speech production. The amplified or resonated harmonic regions of the spectral envelope are called the formant
regions. The first two formant regions are responsible for vowel differentiation, whereas the third through the fifth regions are responsible for the quality of the sounds (timbre). The different sounds of speech are produced by alterations in the length, shape, and mouth opening of the vocal tract. This is under voluntary control. The vocal professional learns to alter the shape of the vocal tract to produce the target sound quality.
Figure 72.5 Spectrogram.
The formant regions can be evaluated through spectral analysis of the vocal signal. The spectrogram is a visual representation of the audible sounds. It is a plot of frequency and intensity changes of sound over time (5
). To produce a spectrogram, the sound emanating from the vocal tract is broken into various frequency regions as it is passed through a variable bandpass filter. The filter identifies the audible frequency regions between 0 and 8,000 Hz. The harmonic frequencies within this region are represented on the y
-axis and time is represented on the x
-axis. Intensity of the various harmonics is represented by darker frequency bands on the graph. The darker bands represent the amplified harmonics and are called the formant regions.
Vowels have characteristic first- and second-formant frequencies. These are represented as F1 and F2. The vowel is determined by the absolute F1 and F2 as well as the relative distance between them (Fig. 72.5
). Harmonics are selectively amplified by changes in the length, shape, and distal opening characteristics of the vocal tract. The higher formants (F3 to F5) are responsible for the timbre of the sounds and differences in the sound characteristics between speakers. Classically trained singers learn to cluster the third, fourth, and fifth formant regions to amplify the harmonic sounds frequencies between 2,800 and 3,500 Hz (6
). Amplified sound in this frequency range is preferentially detected by the human ear over other sounds. It is known as the singer’s formant
, and is produced by learned behavior through which the classically trained singer, through movements in the tongue, pharynx, and lips, manipulates the vocal tract into a certain shape to amplify selectively the harmonics in the desired regions (7
Although the quality of the spectral envelope is largely influenced by the shape, length, and distal opening of the vocal tract, it also is affected by the richness and quality of the harmonic spectrum presented to it as the sound source. Therefore, the vocal professional must be able to regulate
the harmonic source spectrum from the larynx, as well as the shape of the supraglottal vocal tract. Through manipulations of these two organ systems, all human speech sounds for professional and nonprofessional voice are produced (8