CHAPTER 61 The Professional Voice
Anatomic Considerations
Laryngeal function depends on extrinsic and intrinsic laryngeal musculature. The extrinsic laryngeal muscles alter the position of the larynx, which in turn can affect the length of the vocal tract resonator. Classically trained singers use the extrinsic musculature to stabilize the larynx within the neck when singing.1 The intrinsic laryngeal muscles allow delicate control of adduction, abduction, and tension of the vocal folds.
Within the larynx, the human vocal folds are unique structures with no correlates in another animal species. Hirano,2,3 who contributed greatly to the understanding of the laminar structure of the human vocal fold, described the cover-body theory of vocal fold vibration. The vocal fold is covered by a layer of stratified squamous epithelium. The subepithelial tissue, the lamina propria, is divided into superficial, intermediate, and deep layers. The superficial layer, often called Reinke’s space, is composed of fibroblasts, which produce proteins and glycoproteins to form an extracellular matrix of loose connective tissue. The intermediate layer is composed chiefly of elastin fibers, and the deep layer is composed primarily of collagen fibers. Collagen fibers from the deep layer blend into the underlying thyroarytenoid muscle, which forms the main bulk of the vocal fold (Figs. 61-1 and 61-2).
Blood vessels enter the vocal fold anteriorly and posteriorly. Vessels run parallel to the longitudinal axis of the fold. This arrangement allows the cover to vibrate over the body without placing excessive stretch or shearing forces on the vessels. Electron microscopy has shown that several arteriovenous shunts are present in the vocal fold microcirculation. These shunts may allow autoregulation of blood flow to this area.4
Gray and colleagues5 began to identify the contents of the basement membrane zone and the lamina propria. The basement membrane zone is a complex area anchoring the epidermis to the superficial layer of the lamina propria (Fig. 61-3). It is the site of tremendous shearing forces in the human vocal fold that occur during vocal fold vibration. Excessive shear forces can lead to disruption of the basement membrane zone and the development of infiltrates in this area.6 This process is important in the formation of vocal fold lesions. In the superficial layer of the lamina propria, collagen type III and VII fibers intertwine. This arrangement fixates the basement membrane zone to the superficial layer of the lamina propria yet allows passive stretch during vibration (Fig. 61-4).5,7–9
Immunohistochemical analysis has also been used to study the basement membrane zone and extracellular matrix of the lamina propria. In diseased states, which correlate clinically with vocal fold nodules, the basement membrane zone is widened significantly. In lesions that are clinically labeled polyps, collagen type IV within the basement membrane zone appears less pronounced than in the healthy state. Perhaps this relative weakness predisposes patients to polyp formation under phonotraumatic stress.10,11
Voice Production
Vocalization begins with the air or power supply. The lungs supply the essential energy for sound production by presenting the larynx (oscillator) with a stream of air. The diaphragm, the intercostal, back, and abdominal musculature, and the elastic recoil of the chest wall work in concert during inspiration and expiration to control the release of air.12,13 Classically trained singers use the abdominal and thoracic musculature to regulate exhalation; they tend to use a greater percentage of total lung capacity than non–classically trained singers to produce sound in a more efficient manner.14,15 This enhanced efficiency of air propulsion to the larynx is a key difference between trained and untrained voice users.
The intensity of the sound source is related directly to subglottic pressure—that is, as subglottal pressure increases, sound intensity also increases. Humans can alter subglottal pressure, and therefore sound intensity, by two methods. The first and probably more efficient method is to modify the force of the expelled air from the trachea. This is accomplished through activation of the abdominal and thoracic musculature to increase the amount of air inspired and then, partially through elastic recoil properties of the thoracic cavity and partially through voluntary muscular activity, controlling the rate of air egress. The varied regional schools of classical singing all emphasize different areas of muscular control to accomplish this phenomenon.16 However, the effect is the same in that the percentage of air used during singing is greater.17 The second method used to control subglottal pressure is to modify the force of vocal fold adduction. This method is somewhat less efficient. Increasing the force of laryngeal closure through activity of the thyroarytenoid, lateral cricoarytenoid, and interarytenoid muscles achieves greater resistance to the glottal opening. This in turn raises subglottal pressure, which increases sound intensity. However, frequency of vocal fold vibration is directly related to tension within the vibratory system. Therefore, if sound intensity is controlled by the addition of tension in the vibrating system, the frequency of vibration can be inadvertently affected.
The harmonic frequencies that are amplified are referred to as formant regions. They shape the output from the sound source into sounds appreciated as vocal communication. Through spectral analysis of the voiced signal, we can measure four or five formant regions significant in vocal sound production. The first two of these regions are primarily responsible for vowel determination, whereas the third, fourth, and fifth formant regions color the sound or provide timbre. Vocal professionals, particularly classically trained singers, are able to alter the characteristics of the vocal tract to modulate or shift these formant regions. When the third through fifth formant regions are brought closer together by the voluntary changes in characteristics of the vocal tract, they amplify one another and a ring, termed the singer’s formant, is produced. This formant region, in the range of 2300 to 3200 cycles/second, is detected by the human auditory system preferentially over other frequencies, allowing the singer to be heard and understood above the sound of an orchestra or other instruments.18–20 Appropriate use of these principles may give a professional voice user greater vocal efficiency, that is, greater radiated output with less physical effort. A trained vocal professional provides an aesthetically pleasing sound quality for the listener by modulating the formant regions of the sound produced in the following ways: (1) altering the length of the vocal tract through actions of the abdominal, thoracic, and cervical musculature, (2) altering the shape of the vocal tract through the action of the pharynx, tongue, jaw, and lips, and (3) altering the size of the distal opening primarily through the actions of the jaw and lips. The purpose of all vocal training, either commercial or classical, is to teach the performer to control these vocal subsystems to produce the desired, and hopefully aesthetically pleasing, sound.
Laryngeal Stroboscopy
Although first reported by Oertel21,22 in 1878, stroboscopic examination of the larynx has only recently become popular in the United States. Stroboscopy is necessary to evaluate the vibratory patterns of the vocal folds that occur too rapidly to be visualized by the unaided human eye.23–25 According to Talbot’s law, the retina is able to resolve only five images per second. Therefore, images presented to the retina for less than 0.2 seconds (5 images/sec) persist and are fused together by the ocular cortex to produce apparent motion. Because the vocal folds vibrate at rates of 75 to 1000 cycles/second, even the slowest vibratory patterns cannot be visualized without assistance. During stroboscopy the larynx is visualized with a xenon light source. Characteristics of xenon light allow rapid on-and-off bursts. In this manner, the larynx is visualized for only brief periods in the range of second. These brief images, sampled from various points across many vibratory cycles, are then fused together to provide apparent slow motion of the laryngeal vibratory tissue. In modern stroboscopic equipment, the rate of laryngeal vibration is sensed by a microphone and used to control the rate of xenon light firing. When the rate of visual sampling of the laryngeal image is out of phase with the rate of vibration, the laryngeal tissue appears to move. When the sampling rate is in phase with the vibratory rate, the laryngeal tissue appears to stand still.
Stroboscopy permits observation of the vibratory action of the vocal folds, which is not possible with still-light examination (Fig. 61-5). As previously described, this vibratory action is responsible for sound production. Therefore, by using stroboscopy, the examiner can observe how small lesions alter the normal laryngeal vibratory pattern and glottal closure. The significance of a given lesion can then be determined.
Figure 61-5. Vocal fold vibration. The frontal section (left column) and the view from above (right column).
(From Hirano M, Bless DM. Videostroboscopic Examination of the Larynx. San Diego: Singular Publishing Group; 1993.)
Interpretation of laryngeal stroboscopy requires knowledge of the stroboscopic appearance of the healthy larynx phonating at various frequencies and intensities. A regular format for evaluation also enables a more objective interpretation of this subjective test. Standardized checklists for laryngeal stroboscopy interpretation are available.2,25–27 Evaluation criteria include symmetry, amplitude, periodicity, mucosal wave propagation, and glottal closure (Table 61-1). These vibratory characteristics are evaluated at a comfortable loudness level and modal speech frequency. In professional voice patients, it is beneficial to perform laryngeal stroboscopy during high and low pitch and loud and soft phonation. This approach provides additional data about the vibratory characteristics. If a professional voice patient is having difficulty at a particular point in the vocal range, stroboscopy and laryngoscopy should be performed while the patient phonates within the troubled range. With this approach, the clinician may observe subtle vibratory changes that may be the source of the patient’s vocal difficulties.
Criteria | Result |
---|---|
Symmetry | Normal |
Side to side | |
Teeter-totter | |
Vertical O not symmetric | |
Amplitude | Right equals left |
Right is greater than left | |
Left is greater than right | |
Both decreased | |
Periodicity | Yes, consistent |
Yes, inconsistent | |
No, inconsistent | |
No, consistent | |
Mucosal wave | Right normal |
Right great | |
Right abnormal pattern | |
Right decreased | |
Right adynamic (where) | |
Left normal | |
Left great | |
Left abnormal pattern | |
Left decreased | |
Left adynamic (where) | |
Closure | Complete, long |
Complete, short | |
Small posterior chink | |
Large posterior chink | |
Slit | |
Elliptic | |
50% Elliptic | |
Hourglass | |
Asymmetric hourglass | |
Other | |
RECORDING QUALITY (1 = Poor, to 4 = Great) | |
Focus ____________ Size ______________ Brightness _____________ | |
Color ______________ Notable feature _____________ | |
Videotape number: ______________ | |
Verbal diagnosis: _____________________________________________ |
Periodicity, or the regularity of successive glottal cycles, is ascertained by synchronizing the stroboscopic flash with the frequency of vocal fold vibration. The vocal folds are visualized at approximately the same point in each cycle. This maneuver “freezes” the image or makes the vocal folds appear to be standing still. Any perceived motion of the folds indicates aperiodicity. Any alteration in the balance of the vocal folds and the lungs can result in aperiodic vibrations. During a single phonation, vibratory cycles can range from periodic to aperiodic. Therefore, it may be helpful to determine whether the vibratory pattern is completely periodic, mostly periodic, mostly aperiodic, or completely aperiodic.28,29
In short, with tensing of the vocal fold for elevation of pitch, the vocal fold cover thins in three dimensions and the time difference between closings of the lower lip of the vocal fold and upper lip (the vertical phase difference) is reduced. This action, which can be witnessed under stroboscopic light examination, is a critical feature in professional voice patients. Often a small lesion or stiffness along the medial surface of the vocal fold will become noticeable only as the vocal fold is stiffened by elevation pitch. The action of elevating pitch limits the vibratory motions of the vocal fold to the superficial region of the cover. This is one of the first areas injured by prolonged or excessive phonation.11 It is visualized as a reduction in the distinctness of the upper and lower mass formation from one vocal fold to the other.
The horizontal phase of vocal fold vibration has been described as a “ripple of light across the superior surface” of the vocal fold.30 It is a reflection of light either from the upper lip of the vocal fold as it travels from medial to lateral or from motion of the mucosa created by a shock wave as the two upper lips meet during closure. This wave is similar to the wave moving across the surface of a pond after disturbance of the water by a pebble. Lesions that stiffen the mucosa and reduce its pliability lead to loss of this light reflex. This is an important characteristic when visualized under stroboscopic light examination, particularly when the vocal folds are compared with each other at various pitches of phonation. Lesions that fill the superficial layer of the lamina propria and abut or infiltrate the vocal ligament tend to restrict or eliminate both components of the mucosal wave. In contrast, small to moderate-sized lesions limited to the superficial portion of the superficial layer of the lamina propria usually allow propagation of the wave, although it may be decreased and asymmetric.31,32 Finally, large and exophytic lesions may disrupt the mucosal vibratory characteristic even if they do not infiltrate deeply into the lamina propria, by altering the glottal shape and impairing glottal closure.
Closure of the membranous glottis is vital to laryngeal efficiency. Men usually have complete glottal closure, whereas up to 70% of women normally show a small posterior glottal chink.33 This glottal chink, however, is considered normal only when it extends from the vocal process of the arytenoid posteriorly. This region from the vocal process to the posterior commissure, referred to as the cartilaginous glottis, is not typically important in phonation unless the closure deficiency is large enough to create alterations in closure of the membranous portions of the rima glottic tissue. Berry and colleagues34 determined that the most efficient glottal output occurred when the vocal folds were approximately 1 mm apart at the region of the vocal process. Glottic closure patterns can be described as complete, long or short, small or large posterior chink, slit, elliptic, and hourglass or asymmetric hourglass. Closure can be altered by a mass lesion, scarring, muscular tension, and neurologic abnormalities, which become clinically significant when they involve deficiencies of closure at the membranous vocal fold level.
Voice Analysis
Spectrometry
Spectrometry provides a visual display of vocal harmonics and noise. In spectral analysis of sound, time is plotted on the vertical axis against frequency and intensity. This display shows the impact of resonance (formant structure) and articulation on the laryngeal buzz. Spectral analysis can evaluate and compare resonance changes and may be useful in documenting vocal alterations after surgical procedures on the pharynx. Some laryngologists have found it to be valuable in singers and other professional voice patients.35,36
Electroglottography
Electroglottography measures the efficiency of glottal closure by graphically recording the contact time of the vocal folds. It shows the opening and closing rates of the vocal folds, which are not well visualized by stroboscopy. Electroglottography is performed by the passage of a low-voltage, high-frequency current between two electrodes placed on either side of the patient’s neck. It measures the electrical impedance, which varies with opening and closing of the glottis. Some clinicians consider this measure objective and reproducible. Electroglottography may provide clinically useful information when combined with laryngeal stroboscopy or other measurements of layrngeal function.37–39