Clinical Voice Assessment: The Role & Value of the Phonatory Function Studies: Introduction
The purpose of a clinical voice evaluation is to provide the referring laryngologist with patient-specific, clinically relevant pathophysiologic information of the actual voice production process used by the dysphonic patient, the nature of the dysphonic sound generated by a patient, and the physiologic conditions responsible for the sound production. The generated report must be clear and explanatory enough to aid the referring laryngologist with differential diagnosis and treatment planning. Moreover, the generated information must be capable of predicting treatment outcomes and powerful enough to warn the treating physician of any possible complications to the voice that may result from the proposed or planned treatment—whether medical, surgical, therapeutic, or a combination. Clinical voice evaluation is not a quick procedure. It may take up to 1 hour to conduct phonatory function studies (PhFS) on a noncomplicated patient, whereas it may take a substantially longer time to evaluate a patient who is a professional voice user.
The clinical exam comprises a battery of PhFS composed of at least of two primary parts: (1) an acoustic portion that examines the nature of the generated sound (CPT 92520 and 92506), and (2) a visual portion that examines via stroboscopic transoral or transnasal approach the glottis and surrounding area including the subglottis. Visualization of the subglottis is of paramount clinical value when examining papilloma, trauma, and/or subglottic stenosis patients. The exam must result in a clinically relevant description of the parameters that specify and regulate the vibratory patterns of the vocal cords and/or the other vocal tract elements that are causative of dysphonia. This portion of the exam is coded as 31579 using CPT code. (Note: When examining alaryngeal patient, or when utilizing other procedures or tests, additional CPT codes apply.)
PhFS are considered a standard of modern voice care because they provide information beyond subjective clinical impressions; they also provide objective descriptions of normal and pathologic phonatory processes. These processes include (1) mapping acoustic voice characteristics, (2) correlating voice with physiologic findings, (3) providing guidelines for the development of efficacious treatment plans, (4) predicting the progress and outcomes of treatment plans, (5) providing preoperative–postoperative lesion mappings, and (6) providing documentation for medicolegal purposes. PhFS are reproducible and allow a contrast of individual results to a database specific to the patient’s age and gender. The information these studies provide also allows for a frank discussion with the patient and education of the patient, including discussion of the risks and alternatives associated with various treatments.
The acoustic portion (92520 with the various modifiers used) records and analyzes the voice of the patient. This portion is of paramount value, specifically when a surgical intervention is planned and when the patient uses voice as a tool of labor. Not having a voice recording of a patient as a part of record is simply inexcusable and must be treated as a serious error on the part of the practicing laryngologist. Having a voice recording is a must even if a litigation is not pending. Do not ignore this part of the exam. Acoustic recordings—if possible video recordings—should encompass content (vocal-text) relevant to the work needs and work conditions of the patient.
The physiologic portion (31579) visualizes via stroboscopic exam (phonoscopy) the mechanics of phonation and also maps the location, the extent, and the effects of phonatory lesions (when present), and their contribution to dysphonia. Keep in mind that a mismatch may be present between the acoustic and the visual data, that is, large lesion but a relatively good voice, or a small lesion or no lesion at all and a very poor voice, that not all glottic lesions require an immediate surgical procedure, and that not having an organic finding warrants a diagnosis of a functional dysphonia or even worse, a finding of malingering. In today’s clinical practice, it is therefore necessary to have at your disposal a comprehensive documentation of the phonatory mechanism. Documentation that shows objectively the location of the lesion or the mechanism of dysphonia is a necessity when postoperative dispute occurs. When operating on a patient, one must have preoperative stroboscopic mapping and voice recordings. Once visualization is conducted, the relevant videographs should be taken to the operating room (OR), placed in OR records. It behooves to compare preoperative visual documentation with direct MDL observation in the OR, for the purposes of validating preoperative findings with operative findings.
In addition to these two primary components, special tests may also be a part of the PhFS battery. These include delayed auditory feedback, voice load tests, nerve blocks, manual compression tests, and EMG.
In addition to the goals discussed, the information derived through PhFS is crucial in providing pre- and postsurgical documentation, in mapping acoustic and visual lesion(s), and in matching the presence or absence of lesions to the voice quality produced. PhFS are also crucial in documenting follow-up and when considering treatment revision in patient education; moreover, they are a must in medicolegal proceedings.
Voice Production
Voice is an acoustic product resulting from the semicyclical vibrations of the two vocal cord(s) (ie, vocal folds) that are located in the larynx, commonly referred to as the voice box. Therefore, abnormal voice is a consequence of the underlying phonatory pathophysiology, reflecting the physical conditions of the vocal cords and the rest of the vocal tract, comprising the subglottic and supraglottic structures.
The vibration of the vocal cords is age and gender dependent and is controlled by myoelastic properties and aerodynamic forces; the vibration is generated as the air expelled under pressure from the lungs passes between the vocal cords and sets the cords into an oscillatory motion.
The myoelastic properties consist of the paired intrinsic laryngeal muscles (ILM), which are responsible for the size, shape, length, mass, stiffness, and tension characteristics of the vocal cords. The ILM include the thyroarytenoid muscles, the pairs of lateral cricoarytenoid muscles, the posterior cricoarytenoid muscles, and the interarytenoid muscle, which consists of both transverse and oblique portions. The ILM are innervated by the recurrent laryngeal nerves (RLNs) and all muscles, with the exception of the posterior cricoarytenoid muscles (the only vocal cord abductor), are responsible for vocal cord adduction and vocal cord approximation needed for vocalization to take place. The bilateral cricothyroid musculature is responsible for the thyroid cartilage downward tilt that elongates the vocal cords. These muscles are principally responsible for pitch elevation. The nonmuscular myoelastic properties include membranes (mucosa), ligaments, glandular elements, a blood supply, and nerves, all of which are located within the articulating cartilaginous housing that comprises the thyroid, the cricoid, and the two arytenoid cartilages.
Normal voice is actually generated by the vibratory wave-generating oscillations of the membranous portion of the vocal cords (the mucosa), which slides/glides in an undulating (phase locked) manner over the underlying muscle. When the mucosa, the submucosal space, the muscles, the vascular elements, the cartilages, or the compression of the glottis are affected, including the subglottic and supraglottic structures, pathologic voice quality results, and voice may not be a product only of the true vocal cords, but may be produced in alternative ways, including, for example, false vocal fold(s) vibration or, supraglottic vibration against the, that is, the epiglottis, etc. Therefore, PhFS must be capable of revealing altered phonation and of describing the glottic and the nonglottic mechanism that either generates, confuses, or coproduces the sound of the patient. Why? Because “fixing” the alternative phonatory generator may actually cause further loss of the voice. This is especially crucial when trying to rehabilitate voice in the blunt or penetrating laryngeal trauma patient. This description is also of paramount importance in differential diagnosis of dysphonia in patients in whom no visible VC pathology is noted, but in whom a dysphonic output is present.
The entire voice box rests on the trachea and is suspended above from the hyoid bone, which communicates with the base of the tongue. When this connection is affected by as little as minor lingual tension or inappropriate vertical larynx positioning, the result may include altered voice production.
In addition to the intrinsic articulation accomplished at the cricoarytenoid and cricothyroid (ie, synovial type) joints, the entire larynx is subject to vertical motions produced by the action of the paired extrinsic laryngeal musculature. These vertical laryngeal motions are crucial in phonation (singing), swallowing, respiration, and yawning, and in speech articulation. When this vertical movement is affected, voice production may be severely compromised even if the glottis looks “normal” on a routine ear, nose, and throat (ENT) exam.
Both voluntary and involuntary phonation occurs after the efferent signals generated in the motor cortex proceed via the brainstem nuclei and the left and right branches of the vagus nerve (CN X) to reach the two vocal cords. Signals terminate in the motor end plates of the ILM via the left and right RLNs, resulting in vocal cord contractions. The entire efferent process can be accomplished within 90 ms, and it requires coordination of all vocal tract and respiratory laryngeal musculature via the central nervous system motor neurons. The coordination of these movements is achieved by a complex neural network with access to phonatory motor neuron pools that receive proprioceptive input from the various receptors associated with these three systems and by control of voluntary vocalization rather than involuntary vocalization involving different brain regions.
The RLN is a mixed nerve containing an average of 1200 myelinated axons and thousands of unmyelinated axons, including some specialized endoneural organs.
The left RLN is longer than the right nerve, but because of the differential axonal composition of both nerves, the efferent impulses manage to arrive at the two vocal cords almost simultaneously, causing the vocal cord vibration to be semiperiodic. This type of vibration makes the sound of the voice “human.”
The vagus nerve also branches into the left and right superior laryngeal nerves (SLNs), which mediate the afferent signals from the larynx via their internal branches. The external branches of the SLNs are the motor branches innervating the paired cricothyroid muscles, which function as the primary pitch elevators. This specific vagus nerve branching explains why combined recurrent and SLN injuries (eg, paralysis) are rare. The action of the cricothyroid musculature is also responsible for the motion of the vocal cords seen in paralysis of the vocal cords due to RLN involvement. When some motion of the vocal cord is observed on the paralyzed side, it must be interpreted with caution as a sign of recovery, but rather as motion secondary to the ipsilateral SLN-mediated impulses. When the SLN is out in addition to the RLN, the posterior glottis will not approximate, a wider posterior gap will be present, and the arytenoids will not touch on phonation. Observing and documenting these conditions during clinical PhFS are of paramount importance for treatment planning.
Because of the contra- and ipsilateral innervation of the corticobulbar tract, a unilateral corticobulbar tract lesion will not cause unilateral vocal cord paralysis.
With regard to phonation, the vocal cords are subdivided into muscular components (the so-called “body”) and nonmuscular components (the so-called “cover”). The body of the vocal cords is formed by the two thyroarytenoid muscles, which contain fast (adductive) and slow (eg, phonatory) fibers that determine the length, contour, and glottic closure shape of the vocal cords and that regulate the tension of the cover that slides over the body of the vocal cords to create the mucosal vibratory wave. The mucosal vibratory wave cannot be observed with simple visualization, but under stroboscopic illumination or superfast filming, where it is seen to undulate, proceeding from the inferior (ie, lower lip) to the superior surface (ie, upper lip) of the vocal cords (Figure 29–1).
Figure 29–1.
(A) The vocal cords at rest, forming a V-shaped space (the glottis), divided into the vibratory (membranous) and nonvibratory (cartilaginous) portions. (B) The vocal cords during phonatory approximation. The vocal cords are divided into anterior, mid, and posterior thirds. With regard to phonation, the vocal cords are divided into the upper vibratory lips (dottedline) and the lower vibratory lips (dashedlines).
The area between the upper and lower lips adjusts as pitch and loudness change; therefore, when a phonatory lesion is located within this space, its location and size determine the area of pitch and loudness dysfunction. Typically, more severe symptoms are caused by small but anteriorly located lesions than by larger lesions located toward the upper lip or on the superior phonatory surfaces. Typically, an anterior commissure lesion located ± 3 mm above the lower lip profoundly affects the voice, whereas even a large inferiorly located web (<3 mm below the lower lip) does not affect the voice. This is crucial to both treatment and diagnosis. To secure this observation, PhFS are needed.
The cover is subdivided into the outer and the inner layers and the lamina propria; the latter consists of three layers: superficial (the Reinke space), intermediate, and deep. The vocal ligament is the free edge of the conus elasticus, belonging to the deep and intermediate layers of the lamina propria. Obliteration of the Reinke space retards or prevents the mucosal vibratory wave, resulting in dysphonia of varying severity. However, if one vocal cord is stiff but straight (nonvibratory) and the other vibrates and approximates well against the nonvibrating vocal cord, the voice may be remarkably good despite the insufficiency of one cord. Therefore, it is important at times not to “repair” the stiff vocal cord but to leave it alone or even make it stiffer to improve the overall voice quality. Most benign phonatory mucosal lesions are typically found within the superficial layer. If the lesion is located on the superior surface of the vocal cord away from the vibratory edge, the voice may not be affected at all, even if the lesion is large. These findings are crucial in determining the extent of surgical interventions. A common sense real estate rule of “location, location, location” should prevail. In other words, it is often the location and not the size of the lesion that determines its value to the voice quality.
From the clinical point of view, vocal cords are also subdivided into the vibratory (membranous) and nonvibratory (cartilaginous) portions. At rest, they outline a V-shaped space called the glottis (see Figure 29–1). The front of this V forms the anterior glottic commissure, and the back of the V forms the posterior glottic commissure. The posterior end of each vocal cord (the thyroarytenoid muscle) inserts into the muscular process of each of the arytenoid cartilages. The maximum width of the posterior commissure occurs during inspiration or cough and measures approximately 9–12 mm, or three times the most posterior width of the muscular portion of the vocal cord at rest.
After puberty, the length of the vibratory portions of the vocal cords at rest is approximately 13 mm for women and 16 mm for men. When the vocal cords approximate for phonation, the entire glottis is closed in a male, whereas a small posterior chink is often present in a female, giving the female voice quality a slightly softer and airy tone. The specific shapes of glottic phonatory closure allow variations in normal voice qualities.
Furthermore, the vocal cords are clinically subdivided into anterior, middle, and posterior thirds, with nodular lesions usually located at the anterior third juncture and opposite each other if bilateral. An asymmetric location of mucosal lesions is found in mixed-type organic dysphonias.
The two thyroarytenoid muscles, together with the other ILM and the extrinsic laryngeal muscles, control the relative elasticity and stiffness of the vocal cords. They also determine the shape of the mucosal vibratory wave, which in turn determines the pitch, loudness, and tone of the voice. The amplitude of the mucosal vibratory wave is wider at the lower pitches, whereas reduced mucosal vibratory wave amplitude predominates at high pitches or at any pitch level when the cover is stiff.
The duration and shape of the mucosal vibratory wave cycle form specific opening and closing phases that determine specific vibratory modes or vocal qualities (eg, fry, normal, overpressured, breathy, or falsetto). The time interval between cycles is called the fundamental period (F0), whereas in perceptual terms it is referred to as a pitch period.
The aerodynamic properties of phonation include the subglottic air pressure (Ps), the airflow (AF), the supraglottic pressure (Ps), the intraoral pressure (Pio), and the glottal resistance, all of which are responsible for the Bernoulli effect, which separates the approximated vocal cords during phonation.
To generate sound, Ps must reach at least 5 cm H2O, but Ps can exceed 50 cm H2O in loud or overly pressured (ie, pathologic) phonation. Typically, a normal conversational voice is produced between 6 and 10 cm H2O Ps at approximately 65–70 dB, whereas a loud voice can reach 85–95 dB.
The mean airflow in normal phonation ranges from 89 to 141 mL/s and increases as the fundamental period and the loudness are elevated. The glottal resistance cannot be measured directly, but is estimated to vary from 20 to 150 dyne/s/cm3 depending on the pitch and the sound intensity.
When the voice (F0) resonates within the entire vocal tract (ie, the larynx, trachea, pharynx, and oral and nasal cavities) and when the vocal tract articulates, speech, singing, or other forms of communication are formed. Because of specific vocal tract configuration, in the voices of opera singers, specific sound regions are amplified; these areas are referred to as formants (F1–F5), and their combination determines the characteristic of each vowel. Opera singers form unique vocal tract shapes to allow noninjurious and efficient singing, and they show a unique clustering of powerful spectral peaks (the so-called singing formants) at about 3 kHz. This clustering results in an acoustic boost that helps a singer to compete with the sound of an orchestra. The production of singers’ formants is possible when the entire larynx is lowered in the neck, but not when the larynx goes up as pitch elevates. Other acoustic features are emphasized in different singing styles. Because inappropriate larynx tracking can be potentially injurious to the voice, an examination of the vertical larynx position (VLP) is advised when evaluating the vocal problems of individuals who use their voices professionally. Ornamentation in voice can result from specific vocal tract configurations and specific time-locked acoustic events, with rate approximating 5–6 Hz for vibrato or vocal tremor. It is interesting to note that tremor-like vocal oscillations having similar rate may be present in deception.