The Professional Voice

CHAPTER 61 The Professional Voice




Key Points
















Care of patients who use their voices professionally requires knowledge and skills not easily mastered within the field of otolaryngology alone. It is part of the discipline of performing arts medicine. The laryngologist enlists the expertise of speech language pathologists and vocal pedagogue retrain and rehabilitate the professional voice patient. A team approach is mandatory and has been strengthened over the past decade by the establishment of several multidisciplinary voice centers.


Professional voice patients are a diverse group. Limiting the definition to singers and actors is too narrow. All people who depend on speaking or singing skills for employment (e.g., salesmen, receptionists, telephone operators, lawyers, clergy, teachers, politicians, public speakers, and most physicians) should be considered professional voice users, because all of them place diverse yet significant demands on their voices.


Singers and actors (performing vocal professionals) put the greatest demand on the vocal apparatus. The extraordinary amount of practice and performance stress they are under exceeds that of any other type of vocal professional. They are often highly trained and push their voices to their physical limits. No other patients are as sensitive to subtle changes in their vocal abilities. Singers and actors with voice disorders often challenge the most experienced laryngologist. The knowledge and expertise gained from managing such patients can and should be generalized to care for other professional and nonprofessional voice users with voice disorders. It is no longer appropriate to treat voice problems in nonprofessional voice users differently from those in professional voice users.



Anatomic Considerations


Voice and voice use patterns can be affected by emotional status and general health. Therefore, in the evaluation of the patient with a voice disorder, the entire body and psyche should be considered. The body itself is the vocal instrument, and the larynx is its most sensitive part. Altered function in nearly any area of the vocal professional’s body can result in vocal changes. The larynx, therefore, should not be evaluated as an isolated entity.


Sound generation of any type requires a power source, a vibrator, and a resonator. The lungs are the power supply, the larynx is the vibratory source, and the supraglottal vocal tract (supraglottic larynx, pharynx, oral cavity, and, potentially, the nasal cavity) is the resonator, which shapes the sound into words and song. The sound of the voice is affected by changes in any of these three systems, which should be regarded as a unit during evaluation of the professional voice patient.


Laryngeal function depends on extrinsic and intrinsic laryngeal musculature. The extrinsic laryngeal muscles alter the position of the larynx, which in turn can affect the length of the vocal tract resonator. Classically trained singers use the extrinsic musculature to stabilize the larynx within the neck when singing.1 The intrinsic laryngeal muscles allow delicate control of adduction, abduction, and tension of the vocal folds.


Within the larynx, the human vocal folds are unique structures with no correlates in another animal species. Hirano,2,3 who contributed greatly to the understanding of the laminar structure of the human vocal fold, described the cover-body theory of vocal fold vibration. The vocal fold is covered by a layer of stratified squamous epithelium. The subepithelial tissue, the lamina propria, is divided into superficial, intermediate, and deep layers. The superficial layer, often called Reinke’s space, is composed of fibroblasts, which produce proteins and glycoproteins to form an extracellular matrix of loose connective tissue. The intermediate layer is composed chiefly of elastin fibers, and the deep layer is composed primarily of collagen fibers. Collagen fibers from the deep layer blend into the underlying thyroarytenoid muscle, which forms the main bulk of the vocal fold (Figs. 61-1 and 61-2).




According to the cover-body theory of vocal fold vibration, the cover is composed of the overlying epithelium combined with the superficial layer of the lamina propria. The intermediate and deep layers of the lamina propria, known as the vocal ligament, form a transition zone, and the body is composed primarily of the thyroarytenoid muscle. The contrasting masses and physical properties of the vocal fold cover and the body causes them to move at different rates as air passes between the vocal folds. This movement, or vibration, creates sound at the level of the vocal folds by disturbing the local pressure equilibrium within the area of the glottis. The sound, a buzzing-like tone, is modulated and radiated by the supraglottal vocal tract into audible speech or song.


Blood vessels enter the vocal fold anteriorly and posteriorly. Vessels run parallel to the longitudinal axis of the fold. This arrangement allows the cover to vibrate over the body without placing excessive stretch or shearing forces on the vessels. Electron microscopy has shown that several arteriovenous shunts are present in the vocal fold microcirculation. These shunts may allow autoregulation of blood flow to this area.4


Gray and colleagues5 began to identify the contents of the basement membrane zone and the lamina propria. The basement membrane zone is a complex area anchoring the epidermis to the superficial layer of the lamina propria (Fig. 61-3). It is the site of tremendous shearing forces in the human vocal fold that occur during vocal fold vibration. Excessive shear forces can lead to disruption of the basement membrane zone and the development of infiltrates in this area.6 This process is important in the formation of vocal fold lesions. In the superficial layer of the lamina propria, collagen type III and VII fibers intertwine. This arrangement fixates the basement membrane zone to the superficial layer of the lamina propria yet allows passive stretch during vibration (Fig. 61-4).5,79




Immunohistochemical analysis has also been used to study the basement membrane zone and extracellular matrix of the lamina propria. In diseased states, which correlate clinically with vocal fold nodules, the basement membrane zone is widened significantly. In lesions that are clinically labeled polyps, collagen type IV within the basement membrane zone appears less pronounced than in the healthy state. Perhaps this relative weakness predisposes patients to polyp formation under phonotraumatic stress.10,11



Voice Production


Vocalization begins with the air or power supply. The lungs supply the essential energy for sound production by presenting the larynx (oscillator) with a stream of air. The diaphragm, the intercostal, back, and abdominal musculature, and the elastic recoil of the chest wall work in concert during inspiration and expiration to control the release of air.12,13 Classically trained singers use the abdominal and thoracic musculature to regulate exhalation; they tend to use a greater percentage of total lung capacity than non–classically trained singers to produce sound in a more efficient manner.14,15 This enhanced efficiency of air propulsion to the larynx is a key difference between trained and untrained voice users.


As the diaphragm relaxes and the chest wall recoils to a resting state, air is pushed through the nearly closed vocal folds. Because the air passage at the glottal level is smaller than the air passage of the trachea and subglottis, pressure in the region of the glottis drops as the velocity of the air column increases. The relative vacuum created by this drop in pressure draws the pliable rima glottal tissues of the membranous vocal fold region together; this phenomenon is known as the Bernoulli effect. After closure at the membranous vocal fold at the glottal level, the air column from the lungs and trachea continues to flow into the subglottal region. The rising subglottal air pressure forces the vocal folds back open. The vocal folds, or rima glottal tissues, open from inferiorly to superiorly (inferior to superior lip), forming an alternating convergent and divergent glottal configuration. The aerodynamic forces of the air column and the inherent myoelastic properties of the vocal folds, particularly in the region of the vocal fold cover, are responsible for the repeated opening and closing of the rima glottal tissues that pulses the air column as it flows out of the glottis. These disruptions in the steady state of the tracheal air pressure by glottal activity result in sound production. The sound produced by the vibratory source has a buzzlike quality. In professional voice production, glottal sound production can be further complicated by voluntary muscular activity that can influence the intensity and frequency characteristics of the glottal sound before its presentation to the supraglottal vocal tract.


The intensity of the sound source is related directly to subglottic pressure—that is, as subglottal pressure increases, sound intensity also increases. Humans can alter subglottal pressure, and therefore sound intensity, by two methods. The first and probably more efficient method is to modify the force of the expelled air from the trachea. This is accomplished through activation of the abdominal and thoracic musculature to increase the amount of air inspired and then, partially through elastic recoil properties of the thoracic cavity and partially through voluntary muscular activity, controlling the rate of air egress. The varied regional schools of classical singing all emphasize different areas of muscular control to accomplish this phenomenon.16 However, the effect is the same in that the percentage of air used during singing is greater.17 The second method used to control subglottal pressure is to modify the force of vocal fold adduction. This method is somewhat less efficient. Increasing the force of laryngeal closure through activity of the thyroarytenoid, lateral cricoarytenoid, and interarytenoid muscles achieves greater resistance to the glottal opening. This in turn raises subglottal pressure, which increases sound intensity. However, frequency of vocal fold vibration is directly related to tension within the vibratory system. Therefore, if sound intensity is controlled by the addition of tension in the vibrating system, the frequency of vibration can be inadvertently affected.


Well-trained vocal professionals can independently modulate the frequency characteristic of the source signal from vocal fold vibration through voluntary behaviors. They do so through adjustments in cricothyroid, thyroarytenoid, lateral cricoarytenoid, and interarytenoid muscle activity. The cricothyroid muscle, when activated, elongates the vocal fold, thus tensing the cover and elevating the frequency of vibration. Fine control of the amount of tension is accomplished by balancing these cricothyroid contraction forces against thyroarytenoid, lateral cricoarytenoid, and interarytenoid muscle forces to keep the vocal folds in an appropriate position for phonation. Unopposed cricothyroid muscle contraction leads to increases in the glottal width, which negatively affect the vibratory cycle. In addition, fine control of this mechanism allows the blending of the registers of the singing voice for a smoother transition between what singers term the “chest” and “head” voice regions. Inappropriate or unbalanced changes lead to what is perceived as voice breaks. Although these breaks may be unappealing in a classically trained singer, they can be used for stylistic effects in commercial singing voice production. The yodel is probably the most commonly appreciated stylistic technique using the break in registers to produce a desired sound.


The sound source signal produced by vocal fold oscillation has a fundamental vibratory rate termed the fundamental frequency. Owing to the characteristics of the larynx as a natural vibrator, the sound that is produced has harmonic qualities—that is, as the vocal fold tissue vibrates to disturb the local air pressure, the pressure waves created are refracted. Refracted pressure waves that are out of phase with the fundamental frequency cancel each other out. Waves that are in phase with the fundamental frequency, on the other hand, are also radiated. These in-phase waves may be faster or slower by a whole-number multiple of the fundamental frequency. They create the harmonic or subharmonic frequencies produced by the laryngeal sound source. Again, each harmonic is a whole-number multiple of the fundamental frequency. This harmonic sound source is presented to the supraglottal vocal tract. In turn, the supraglottal vocal tract, on the basis of its physical characteristics (length, shape, and size of the opening at the distal end), amplifies or attenuates particular regions in the source harmonic spectrum.


The harmonic frequencies that are amplified are referred to as formant regions. They shape the output from the sound source into sounds appreciated as vocal communication. Through spectral analysis of the voiced signal, we can measure four or five formant regions significant in vocal sound production. The first two of these regions are primarily responsible for vowel determination, whereas the third, fourth, and fifth formant regions color the sound or provide timbre. Vocal professionals, particularly classically trained singers, are able to alter the characteristics of the vocal tract to modulate or shift these formant regions. When the third through fifth formant regions are brought closer together by the voluntary changes in characteristics of the vocal tract, they amplify one another and a ring, termed the singer’s formant, is produced. This formant region, in the range of 2300 to 3200 cycles/second, is detected by the human auditory system preferentially over other frequencies, allowing the singer to be heard and understood above the sound of an orchestra or other instruments.1820 Appropriate use of these principles may give a professional voice user greater vocal efficiency, that is, greater radiated output with less physical effort. A trained vocal professional provides an aesthetically pleasing sound quality for the listener by modulating the formant regions of the sound produced in the following ways: (1) altering the length of the vocal tract through actions of the abdominal, thoracic, and cervical musculature, (2) altering the shape of the vocal tract through the action of the pharynx, tongue, jaw, and lips, and (3) altering the size of the distal opening primarily through the actions of the jaw and lips. The purpose of all vocal training, either commercial or classical, is to teach the performer to control these vocal subsystems to produce the desired, and hopefully aesthetically pleasing, sound.


From this simplified discussion of voice science and the source-filter theory of voice production, the reader should understand that voice and speech production for complex human communication involves the interplay of several subsystems of the human body. These subsystems are:









Changes in any one of these subsystems can affect the vocal output. Therefore, coordination of these subsystems, for complex vocalization, is not an innate activity. Like any complex activity or sporting skill, such as a golf swing or tennis swing, people have a natural ability that is affected by their muscular set pattern, which in turn is determined in part by their genetics and epigenetics.



Laryngeal Stroboscopy


Although first reported by Oertel21,22 in 1878, stroboscopic examination of the larynx has only recently become popular in the United States. Stroboscopy is necessary to evaluate the vibratory patterns of the vocal folds that occur too rapidly to be visualized by the unaided human eye.2325 According to Talbot’s law, the retina is able to resolve only five images per second. Therefore, images presented to the retina for less than 0.2 seconds (5 images/sec) persist and are fused together by the ocular cortex to produce apparent motion. Because the vocal folds vibrate at rates of 75 to 1000 cycles/second, even the slowest vibratory patterns cannot be visualized without assistance. During stroboscopy the larynx is visualized with a xenon light source. Characteristics of xenon light allow rapid on-and-off bursts. In this manner, the larynx is visualized for only brief periods in the range of image second. These brief images, sampled from various points across many vibratory cycles, are then fused together to provide apparent slow motion of the laryngeal vibratory tissue. In modern stroboscopic equipment, the rate of laryngeal vibration is sensed by a microphone and used to control the rate of xenon light firing. When the rate of visual sampling of the laryngeal image is out of phase with the rate of vibration, the laryngeal tissue appears to move. When the sampling rate is in phase with the vibratory rate, the laryngeal tissue appears to stand still.


Stroboscopy permits observation of the vibratory action of the vocal folds, which is not possible with still-light examination (Fig. 61-5). As previously described, this vibratory action is responsible for sound production. Therefore, by using stroboscopy, the examiner can observe how small lesions alter the normal laryngeal vibratory pattern and glottal closure. The significance of a given lesion can then be determined.



In addition to providing information about vibratory status, examinations captured in video format can be reviewed for comparison with previous examinations and for consultation. This information improves accuracy in the diagnosis of vocal problems. Ideally, a baseline laryngeal stroboscopic examination should be performed in each professional voice patient while the health and voice are good. The findings can be compared with the vocal fold appearance during dysphonic states, and conclusions regarding the effects of vibration patterns on the cause of dysphonia can be made.


Recorded laryngeal stroboscopic examinations can be used to follow changes in the glottal vibratory pattern over days, weeks, and years. This process, known as interval examination, helps determine the effects of behavioral, medical, and surgical interventions on the larynx. Changes in laryngeal stroboscopy findings can be shown and documented on videotape, computerized formats, and still prints.


Interpretation of laryngeal stroboscopy requires knowledge of the stroboscopic appearance of the healthy larynx phonating at various frequencies and intensities. A regular format for evaluation also enables a more objective interpretation of this subjective test. Standardized checklists for laryngeal stroboscopy interpretation are available.2,2527 Evaluation criteria include symmetry, amplitude, periodicity, mucosal wave propagation, and glottal closure (Table 61-1). These vibratory characteristics are evaluated at a comfortable loudness level and modal speech frequency. In professional voice patients, it is beneficial to perform laryngeal stroboscopy during high and low pitch and loud and soft phonation. This approach provides additional data about the vibratory characteristics. If a professional voice patient is having difficulty at a particular point in the vocal range, stroboscopy and laryngoscopy should be performed while the patient phonates within the troubled range. With this approach, the clinician may observe subtle vibratory changes that may be the source of the patient’s vocal difficulties.


Table 61-1 Interpretation of Laryngovideostroboscopy
















































































































Criteria Result
Symmetry Normal
  Side to side
  Teeter-totter
  Vertical O not symmetric
Amplitude Right equals left
  Right is greater than left
  Left is greater than right
  Both decreased
Periodicity Yes, consistent
  Yes, inconsistent
  No, inconsistent
  No, consistent
Mucosal wave Right normal
  Right great
  Right abnormal pattern
  Right decreased
  Right adynamic (where)
  Left normal
  Left great
  Left abnormal pattern
  Left decreased
  Left adynamic (where)
Closure Complete, long
  Complete, short
  Small posterior chink
  Large posterior chink
  Slit
  Elliptic
  50% Elliptic
  Hourglass
  Asymmetric hourglass
  Other
RECORDING QUALITY (1 = Poor, to 4 = Great)
Focus ____________ Size ______________ Brightness _____________
Color ______________ Notable feature _____________
Videotape number: ______________
Verbal diagnosis: _____________________________________________

Symmetry refers to the paired appearance of the vocal folds, which are mirror images of each other during glottal vibration. Any difference in the mechanical properties of the vocal folds—mass, tension, pliability of the superficial layer of the lamina propria or mucosa, elasticity, position, or inflammation—can alter symmetry. Asymmetry of vibration can result in dysphonia.


Amplitude of vibration refers to the lateral excursion of the midmembranous portion of the vocal fold during vibration. This movement is normally one third to one half of the width of the visible fold. As with symmetry, lesions that affect the mass, tension, or pliability of the vocal fold alter amplitude. Vocal pitch and vocal intensity or loudness also alter vibratory amplitude. Vocal folds vibrating at high pitch are stiffer and thinner, so their vibratory amplitude is reduced both in total and in relative relation to the visible size of the vocal fold. On the other hand, when volume is increased, particularly by increases in the force of expelled air, the vibratory amplitude increases as well. This phenomenon occurs at all pitches of phonation.


Periodicity, or the regularity of successive glottal cycles, is ascertained by synchronizing the stroboscopic flash with the frequency of vocal fold vibration. The vocal folds are visualized at approximately the same point in each cycle. This maneuver “freezes” the image or makes the vocal folds appear to be standing still. Any perceived motion of the folds indicates aperiodicity. Any alteration in the balance of the vocal folds and the lungs can result in aperiodic vibrations. During a single phonation, vibratory cycles can range from periodic to aperiodic. Therefore, it may be helpful to determine whether the vibratory pattern is completely periodic, mostly periodic, mostly aperiodic, or completely aperiodic.28,29


Mucosal wave propagation has both vertical and horizontal components. The vertical component, known as the vertical phase, is visualized during stroboscopic examination along the medial surface of the vocal fold. During vocal fold vibration, owing to characteristics of the mucosa and the vocalis muscle, two distinct ridges appear to form. These ridges are referred to as the upper and lower masses or lips of the vocal fold. The position of the upper lip is determined by the reflection of the vocal fold mucosa as it turns from horizontal to vertical over the superior surface of the vocal fold. The position of the upper lip is relatively fixed by the physical characteristics of the vocal fold. The lower lip is determined by the change in the physical properties of the mucosa covering the vocal folds from those of the mucosa covering the respiratory tract. Mucosa is technically defined as epithelium and submucosal tissue. Respiratory mucosa consists of a columnar epithelial layer supported by a relatively thin submucosal layer with occasional mucus-producing cells and minor salivary glands. Vocal fold mucosa, however, consists of a stratified nonkeratinized epithelium supported by a relatively thickened and specialized submucosa (see the section on vocal fold anatomy and physiology). The transition between the two types of epithelia is known as the inferior arcuate line or the conus elasticus. As the air pushes through the glottis, the specialized submucosa of the vocal fold separates from the underlying structure. This is the point of mucosal upheaval. As the vocal fold mucosa is tensed through the action of the cricothyroid muscle, the submucosal tissue is thinned three-dimensionally in the same manner that a rubber band thins as it is stretched. This thinning causes the lower lip to move in a cephalad direction relative to the upper lip. In this manner, the tension in the vocal fold is increased and the mass of vocal fold tissue available for vibration is reduced to allow pitch elevation. Because the lower lip moves closer to the upper lip, the vertical phase and the time difference between closure of the lower lip and upper lip regions, known as the vertical phase difference, is also reduced.


In short, with tensing of the vocal fold for elevation of pitch, the vocal fold cover thins in three dimensions and the time difference between closings of the lower lip of the vocal fold and upper lip (the vertical phase difference) is reduced. This action, which can be witnessed under stroboscopic light examination, is a critical feature in professional voice patients. Often a small lesion or stiffness along the medial surface of the vocal fold will become noticeable only as the vocal fold is stiffened by elevation pitch. The action of elevating pitch limits the vibratory motions of the vocal fold to the superficial region of the cover. This is one of the first areas injured by prolonged or excessive phonation.11 It is visualized as a reduction in the distinctness of the upper and lower mass formation from one vocal fold to the other.


The horizontal phase of vocal fold vibration has been described as a “ripple of light across the superior surface” of the vocal fold.30 It is a reflection of light either from the upper lip of the vocal fold as it travels from medial to lateral or from motion of the mucosa created by a shock wave as the two upper lips meet during closure. This wave is similar to the wave moving across the surface of a pond after disturbance of the water by a pebble. Lesions that stiffen the mucosa and reduce its pliability lead to loss of this light reflex. This is an important characteristic when visualized under stroboscopic light examination, particularly when the vocal folds are compared with each other at various pitches of phonation. Lesions that fill the superficial layer of the lamina propria and abut or infiltrate the vocal ligament tend to restrict or eliminate both components of the mucosal wave. In contrast, small to moderate-sized lesions limited to the superficial portion of the superficial layer of the lamina propria usually allow propagation of the wave, although it may be decreased and asymmetric.31,32 Finally, large and exophytic lesions may disrupt the mucosal vibratory characteristic even if they do not infiltrate deeply into the lamina propria, by altering the glottal shape and impairing glottal closure.


Closure of the membranous glottis is vital to laryngeal efficiency. Men usually have complete glottal closure, whereas up to 70% of women normally show a small posterior glottal chink.33 This glottal chink, however, is considered normal only when it extends from the vocal process of the arytenoid posteriorly. This region from the vocal process to the posterior commissure, referred to as the cartilaginous glottis, is not typically important in phonation unless the closure deficiency is large enough to create alterations in closure of the membranous portions of the rima glottic tissue. Berry and colleagues34 determined that the most efficient glottal output occurred when the vocal folds were approximately 1 mm apart at the region of the vocal process. Glottic closure patterns can be described as complete, long or short, small or large posterior chink, slit, elliptic, and hourglass or asymmetric hourglass. Closure can be altered by a mass lesion, scarring, muscular tension, and neurologic abnormalities, which become clinically significant when they involve deficiencies of closure at the membranous vocal fold level.


High-speed digital video is currently a research instrument but holds great promise for the evaluation of the vibrational function of the vocal folds, especially in patients with severe aperiodic dysphonia, in whom stroboscopy is of limited value.



Voice Analysis


Several methods can be used to quantify voice or measure vocal vibration. No single test is considered the gold standard for documenting vocal fold function. All tests have significant limitations. In addition, intrapatient and interpatient variability exists. Therefore, in professional voice patients, perceptual analysis by a trained observer and patient satisfaction with vocal outcome are often the most useful indicators of a successful intervention. Most laryngologists consider objective and semiobjective voice analysis to be important, particularly regarding preoperative and postoperative voice documentation. Little agreement exists as to the optimal tests and their performance, relative importance, or interpretation.





Jun 5, 2016 | Posted by in OTOLARYNGOLOGY | Comments Off on The Professional Voice

Full access? Get Clinical Tree

Get Clinical Tree app for offline access