Flexible Laryngoscopy in Speech-Language Pathology Evaluation


Flexible Laryngoscopy in Speech-Language Pathology Evaluation

Rebecca J. Leonard

The term phonoscopic examination is used to describe the flexible endoscopic examination of dysphonic patients performed by the speech-language pathologist (or voice clinician). This term emphasizes the focus of the exam on understanding the relationship between laryngeal behaviors, including postures and gestures, and the voice. The term also clearly differentiates the assessment performed by the voice clinician, who is interested particularly in “vocal” pathology, from the diagnostic exam performed by an otolaryngologist, who is concerned about laryngeal or other pathology. The exam pairs laryngeal imaging with the voice evaluation traditionally performed by speech pathologists to both hear and see how the voice is produced.

The phonoscopic exam involves sampling laryngeal behavior and voice across a wide variety of tasks, including:

  1. respiratory, vegetative, and phonatory;
  2. the patient’s available fundamental frequency range;
  3. a range of intensities, from soft to loud, at different frequencies;
  4. different phonatory modes (ie, whisper, falsetto);
  5. different phonetic contexts (ie, single sounds, connected speech);
  6. voicing for variable duration (as with sustained sound or repetition of voiced syllables).

If evidence of hyperfunctional and/or inappropriate behaviors is observed, efforts to modify them may be attempted. If the impression is that voice is produced more easily, or with improved quality, in certain contexts, these will be explored further. The process of identification and exploration is often referred to as “treatment probing.”

The exam is performed with the patient seated in the usual position for a flexible endoscopic procedure, with back straight and torso angled forward. Depending on the protocols of the particular setting, a small amount of topical anesthesia may be applied to one or both nasal passages. In our own setting, typically only one nostril is anesthetized. If there is a question about mobility of one or the other vocal fold, viewing from both nostrils can be included. The exam may require 5 minutes or so and is usually well tolerated by the patient. The utility of the exam is dependent on several factors, including the patient’s ability to cooperate, the experience of the clinician, and the quality of the imaging equipment being used.

A rigid endoscope with stroboscopy will permit laryngeal imaging during many of the tasks outlined here; for example, sustained sounds produced at different frequency and intensity levels and voice produced across phonatory modes. The inclusion of stroboscopy, with rigid or flexible endoscopy, also permits the assessment of vibratory and mucosal displacement details, including asymmetries in phase or amplitude that may imply subtle underlying pathology. The particular advantage of flexible endoscopy is the potential for evaluation of the larynx and voice during connected speech (or singing).

Image Value


Voice clinicians have long relied on the sound of voice in making judgments about a patient’s dysphonia. Other perceptions, for example, the apparent effort associated with voice production and the relative difficulty of listeners in hearing and understanding what’s been spoken, are also critical to this appraisal. More recently, a variety of acoustic and aerodynamic assessment tools have permitted greater insights into the nature of dysphonia in individual patients.1 The increasing availability of normative data associated with these instruments and the emphasis on creating standardized approaches to their use have added further to our diagnostic and treatment repertoire. In particular, these measures help to distinguish clearly the disordered voice from normal voicing and allow objective comparisons of voice across time or treatments.

Over the past few years, many voice clinicians have been able to incorporate endoscopic imaging in the management of dysphonic patients. In our opinion, the use of imaging, and flexible endoscopy in particular, represents an enormous step forward in speech pathology practice. Quite simply, the advantage of not only listening to a voice but also observing simultaneously the structures that produce it is huge. Patients are typically referred to the voice clinician by an otolaryngologist who has performed a comprehensive examination of the head and neck, including the larynx. The physician’s description of laryngeal pathology or voicing difficulty may be quite general, for example, “vocal nodules,” “paralysis of left true vocal fold,” or “functional dysphonia.” This information, though important, says nothing about the patient’s use of the larynx to produce voice. Yet, it is an understanding of how the patient uses his or her voice that, in our own experience, is required for the speech pathologist to develop the most efficient and effective treatment plan possible.

Vocal Pathology

Vocal pathology refers to a patient’s inappropriate use of laryngeal structures to produce voice. In some cases, these behaviors may have produced laryngeal pathology. In other instances, they may be the result of laryngeal pathology or represent a response to some other stimulus. The provoking stimulus in these “adaptive dysphonias” (ie, edema, paresis, a lesion) may be clearly present on examination of the larynx. In other cases, the original pathology has resolved and laryngeal structures appear normal, but voice production continues to be abnormal. The onset of vocal pathology may also be associated with stress or psychopathology. Again, in response to some set of provoking stimuli, voice is produced in an atypical manner. Regardless of the origin of vocal pathology, however, the goal of the voice clinician is to redirect a patient’s maladaptive behavior(s) to more appropriate behaviors; that is, ones more consistent with normal voice. If normal voice production is not a reasonable expectation, then voice produced in the most optimal manner possible (ie, best quality, least effort) is the goal. In both cases, the process is likely to require (1) relaxation of hyperfunctional postures; (2) elimination of inappropriate behaviors; (3) modification of both phonatory and vocal fold variables during voice production.

Our contention is that these goals can be much more directly, efficiently, and effectively completed if the clinician knows as much about the adaptive behavior as possible. In short, though the target of normal, or optimal, voice may be the same, the approach to achieving it will vary. Interestingly, whether a voice is “disordered” or not, at least from a listener’s perspective, is largely determined by its sound; that is, how pleasant or unpleasant, strong or weak, audible or inaudible, it may be. But for the voice clinician, relying solely on the sound of voice to infer how it is being produced and developing a treatment plan consistent with this inference can be misleading, for several reasons.

First, in our experience, there is not a simple one-to-one correlation between laryngeal behavior and the voice perceived by this behavior. For example, voice generally characterized as “whisper” may be produced with the true vocal folds abducted, with both the false and true vocal folds adducted and constricted, or with the true vocal folds adducted but not vibrating, and, likely, in several other ways (Fig. 8.1). Relatively large changes in laryngeal behavior may sometimes produce only small or subtle changes in voice, whereas in other instances, small changes in behaviors or phonatory parameters (ie, frequency and airflow) produce large effects in voice. If the clinician is not both listening and observing, these relationships are not clear.

Another important benefit of the phonoscopic exam is realized when dysphonia is intermittent or variable. Inappropriate or hyperfunctional behaviors are sometimes “context-specific”; for example, occurring at the beginning or end of a breath group, when pulmonary support is compromised, or when fundamental frequency achieves a particular level. These kinds of behaviors are particularly common when patients are attempting to compensate for a laryngeal deficit, such as bowing, or glottic incompetence produced by the presence of a lesion. In other situations, for example, where scarring is present, voice quality may be better in selected contexts (ie, particular combination of frequency, intensity, and airflow) due to improved symmetry of vibration. Identification of such contexts can be quite useful to the voice clinician, providing insights into both laryngeal and vocal pathology. With imaging, specific behaviors can be associated with these contexts.

In some patients, there may also be a lag between the onset of a maladaptive behavior and its eventual manifestation in voice. Normal laryngeal behavior and voice may be apparent early in the production of a sustained sound and then be replaced with an increasing degree of false fold or supraglottic constriction; for example, when pulmonary support is compromised. The effect of the progressive hyperconstriction may not be apparent in voice initially but become more perceptible as increased effort or altered quality as pulmonary support is further compromised. If this effect can be further characterized, for example, by how many syllables can be produced before it is observed, or by where in a breath group it first appears, the information may be usefully incorporated into treatment planning. Examples of such context-specific behaviors are presented below (Figs. 8.2,8.3,8.4).


Fig. 8.1 Three different postures of the larynx, all of which produced voice characterized as “whisper,” are illustrated. (A) The true folds are abducted. (B) The false folds are hyperadducted, obscuring the underlying true folds. (C) The true vocal folds are adducted but not vibrating.


Fig. 8.2 (A) A patient complaining of vocal fatigue with singing reveals relatively better laryngeal behavior (and voice) at the onset of phonation. (B) Supraglottic constriction becomes apparent visually when sound is sustained beyond 7 seconds.


Fig. 8.3 In contrast with the illustration in Fig. 8.2, an individual with normal laryngeal and phonatory function is shown sustaining sound for 2 seconds (A) and 24 seconds later (B). No evidence of excessive effort with increased duration is observed.

Laryngeal Pathology

As stated earlier, the intent of the phonoscopic exam is different from that of the diagnostic exam performed by an otolaryngologist. However, observation of the larynx over a wide range of voicing and other tasks may aid the diagnosis of tissue pathology, as well as vocal pathology. This is because laryngeal pathology, like vocal pathology, may be observed only intermittently or may be visualized more easily during certain laryngeal behaviors. A patient with bowing of the vocal folds may appear to have relatively normal vocal fold closure at higher fundamental frequencies but demonstrate significant glottic incompetence at lower frequencies. Because some larynges are difficult to observe at lower frequencies, due to a lowering of the larynx and shortening of the vocal folds, bowing of this type may be missed. Anterior commissure pathology may not be visualized unless the true vocal folds are widely abducted, and pathology that may be displaced below the vocal folds on voicing (ie, polyps, granulomatous lesions) may be more easily observed on voice produced on inspiration (Figs. 8.5 and 8.6). Infrequently, we have identified lesions that were not apparent until the patient was engaged in rigorous voice use for some period of time. This pathology (ie, ventricular cyst) appears to become pneumatized with such exercise and affects voice only in these circumstances. Careful questioning of the patient may prompt the clinician to simulate the situation implicated.


Fig. 8.4 (A) A patient with glottic insufficiency producing voice at soft intensity. (B) With even a small increase in loudness, evidence of false fold constriction and medialization become apparent. The inability of the true vocal folds to increase resistance with increased subglottal pressure (necessary to increasing vocal loudness) is likely related to the maladaptive behavior observed. The evidence of false fold behavior is readily observed visually but may not be apparent in voice until some time after its onset.


Fig. 8.5 The true folds remain partially abducted (A), obscuring the lesion, which reflects greater abduction of the true folds (B).

In other instances, the clinician may find that laryngeal pathology is masked by maladaptive laryngeal behaviors. This can be particularly true when voice production involves use of the false vocal folds or constriction of other supraglottic structures. Relaxation of these postures through treatment probing permits a more thorough evaluation of underlying structures and may lead to the identification of laryngeal pathology not previously recognized (Fig. 8.7). The possibility that voicing behaviors can mask laryngeal pathology also underscores the need for observing the larynx in nonphonatory tasks or in circumstances that permit thorough examination of laryngeal structures.

If laryngeal pathology related to voice use is present, it is important for the voice clinician to understand vocal and laryngeal practices likely to have produced the pathology, as well as practices that may have developed in response to the pathology. Secondary practices, in our experience, can often be dealt with relatively quickly to facilitate resolution of pathology. Behaviors likely to have produced the pathology initially, however, may be more long-standing and, consequently, more resistant to treatment. Careful interviewing of the patient will often be helpful in differentiating these issues.


Fig. 8.6 (A) The vocal folds appear within normal limits. (B) Voice is produced on inspiration, and pathology on the inferior surfaces of the true vocal folds is apparent.


Fig. 8.7 Hyperconstriction of the false folds during voicing (A) obscures pathology apparent during respiration (B).


For patients who are voice therapy candidates, the use of endoscopy provides an excellent opportunity to engage in a process we refer to as “mapping vocal space.” This process requires, first, identification of the patient’s current vocal space; that is, the combination of phonatory and laryngeal behaviors where voice is being produced. Laryngeal and other behaviors that need to be changed in order for a patient to produce more appropriate voice are targeted for modification. Strategies for achieving these modifications, that is, for moving the patient into a more optimal vocal space, are then attempted. The goal will differ depending on the patient’s current vocal capabilities. If laryngeal pathology is present, the goal is the elimination of inappropriate adaptive behaviors or the alteration of vocal fold contact patterns in a way that facilitates resolution of pathology (Figs. 8.8 and 8.9). If maladaptive behaviors persist in the absence of observed laryngeal pathology, the goal is to restore optimal use of structures for voicing. Details of each patient’s situation notwithstanding, the clinician’s task is to elicit voice produced in the most appropriate manner possible (ie, best voice, no hyper-function, no exacerbation of existing pathology, if present).

In our practice, the initial therapy objective is to identify voice produced as appropriately as possible on a single sound (“m” or “ee”). In some cases, a more normal target may be achieved, but at a frequency or intensity, for example, that is not ideal for a given patient. Continued modification of both laryngeal and phonatory variables may be required to produce sound that achieves both laryngeal and vocal appropriateness. Once (or if) produced, the clinician then directs the patient in repeating the desired voice over several trials, again, on only one sound. Occasional rest breaks followed by another series of repetitions may be helpful in stabilizing the voice in this highly structured context. The clinician may counsel the patient to focus on a variety of available feedback sources during these attempts (ie, how it sounds, how it feels). When the patient is able to produce the voice readily and consistently, the vocal “space” in which the sound can be produced is expanded, for example, at different frequencies or intensities. For all phonatory parameters, including frequency, intensity, and airflow, which are perhaps the most frequently manipulated, there are normal ranges, low to high, soft to loud, less or more. The goal of mapping is to achieve normal or improved voice over a range of phonatory behaviors.


Fig. 8.8 Patient status after excision of resistant posterior granuloma reveals posterior contact pattern between the true folds. This pattern was typically observed on voicing tasks sampled.


Fig. 8.9 Patient in Fig. 8.8 after modification of contact pattern between folds. Note the small gap between vocal processes. The intent of modification was to reduce compression forces at the site of residual pathology.

Once sound can be produced in this manner, the clinician expands the contexts in which the improved sound is produced; that is, on syllables, short utterances, and, ultimately, connected speech. The steps described here represent, in our opinion, a logical and often successful approach to voice therapy. With some patients, the progression from a target sound to its consistent use in connected speech may be quite brief, of the order of minutes. For other patients, progress may be significantly slower. When achieved, the clinician’s next goal is consistent use of the improved voice in situations outside the clinic.

Laryngeal Imaging as a Feedback Tool

In our practice, treatment probing is most often performed without providing feedback from laryngeal imaging to the patient. Once our goals and the strategies for achieving them are defined, and when the target voice has been achieved in some contexts, feedback from imaging can prove extremely useful. For this purpose, the patient is seated as comfortably as possible in front of a monitor that permits easy visualization of the larynx. Ideally, the clinician can also view the monitor and be close enough to it to point out various features of the larynx. At this stage of treatment, it becomes important for the patient to understand mechanisms of normal voice production, as well as what has been inappropriate about his or her own use of the larynx to produce voice. By combining visual information with available feedback (ie, auditory and other sensory feedback), the clinician hopes to help the patient develop an ability to identify and self-correct any deviations from the target voice. If the patient can leave the treatment room with these skills, it bodes well for the success of therapy.

One example of the particular utility of imaging in treatment involves its use in patients with vocal process granuloma.2 Patients appropriate for this approach have usually failed multiple medical and surgical treatments for vocal process granuloma. The relationship between patients’ granuloma and laryngopharyngeal reflux should be documented on esophagram or by pH probe. The therapy program is usually offered to patients who demonstrate contact of the vocal processes at the site of pathology on voicing. Though such a pattern is thought to be within the range of normal variability, in the case of a vocal process granuloma it may be interfering with the desired resolution of vocal process pathology (Fig. 8.8).

The goal of the treatment is to produce voice without contact between the cartilaginous portion of the vocal folds, thus minimizing adductor forces at the site of pathology (Fig. 8.9). Treatment probing is first attempted to determine if such a pattern can be identified; that is, if voice can be produced with the patient maintaining a small gap between the vocal processes. The goal is first explained to the patient, and imaging equipment is positioned so that both patient and clinician can visualize contact patterns on a monitor. If achieved on a single sound, the pattern is then expanded across other phonatory contexts, as previously described. We found in a retrospective study of 10 patients who underwent therapy that 8 were able to achieve the treatment objective, and all 8 experienced resolution of pathology or a marked reduction in its extent. Six patients who did not undergo treatment, and the two who were unable to achieve the treatment objective, demonstrated minimal or no improvement, or worsening of their pathology, over the same period of time. This approach to treatment, while certainly not successful with all patients, has become a standard part of our protocol for patients with granuloma resistant to other treatments. A patient’s candidacy for the treatment can typically be determined within the context of a single phonoscopic session, adding to its reasonable inclusion in the voice clinician’s treatment repertoire.

Image Flexible Endoscopy versus Phonoscopy

We previously reported results of a retrospective review undertaken in an attempt to assess objectively the value of the phonoscopic evaluation just discussed.3 In particular, we were interested in a comparison of findings on a phonoscopic evaluation with results of flexible endoscopic exams performed on the same patients by referring physicians. Patients included in the review had undergone a laryngeal exam by the referring otolaryngologist prior to evaluation in our voice clinic. All patients underwent our typical voice clinic protocol, including case history/medical records review; a traditional voice evaluation from which maximum performance, habitual performance, perceptual and acoustic measures are obtained; a laryngeal function study that provides airflow data and estimates of glottal resistance and subglottal pressure; a pulmonary function screen; and phonoscopic evaluation with flexible and rigid endoscopy. For purposes of the study, no patient whose findings were dependent on stroboscopic exam (eg, asymmetry in vibration, mucosal stiffening, lesion differentiation) was included for review. In addition, no patients with laryngeal or vocal tremor were included.

In 26 of 100 cases, our observations were consistent with those of the referring otolaryngologist. In 32 cases, we agreed with the referral diagnosis but found additional factor(s) related to pathology. In 12 of the 32 cases for whom additional pathology was identified, we agreed that the patient demonstrated a particular hyperfunctional behavior, but we believed it was a consequence of underlying pathology (eg, bowing, paralysis, lesion). The diagnosis on referral was of either the hyperfunctional behavior exhibited (eg, plicae ventricularis) or of a “functional” dysphonia, without evidence of any underlying laryngeal pathology. In the remaining cases, we agreed with the referral diagnosis but found additional pathology. Frequently, this was a smaller, for example, contrecoup lesion on the opposite vocal fold. However, in a few cases pathology distant from that noted on referral was identified. In 14 instances of pathology identified, our impression was that it may have been context-specific; that is, more easily identified on some voicing tasks than on others.

In 42 cases, our findings differed from those of the referring physician. The most frequent example in this category was of a lesion identified on phonoscopy that had not been noted in the referral. In contrast with lesions that were believed to be masked by hyperfunctional behaviors or postures, these lesions were not obscured from visualization by the behavior of overlying structures. In several instances, pathology noted on the phonoscopic exam was apparent primarily on particular vocal tasks; that is, context-specific. Interestingly, in seven of these cases, patients were diagnosed with “plicae ventricularis dysphonia” when, in fact, other hyperfunctional postures were responsible for the dysphonia. In one patient, for example, aberrant voice was produced by an arytenoid contacting the epiglottis. In another, the false folds were perhaps slightly medialized on voicing, but the major source of dysphonia was true vocal folds that were abducted and apparently quite tensed. Though these distinctions may seem subtle, and perhaps not that pertinent to the otolaryngologist, they are important to a voice clinician charged with modifying the aberrant behavior.

Image Indications for Phonoscopic Examination

In our clinic, the phonoscopic examination described here is of particular value in the following situations:

  1. Patient is a candidate for voice therapy. The phonoscopic exam will aid the voice clinician in understanding the relationship between laryngeal behavior and voice production. If inappropriate or hyperfunctional behaviors are a primary factor in the patient’s dysphonia, identification and elaboration of these may be a key to the efficiency and effectiveness of voice therapy. If laryngeal pathology is present, the ability to probe laryngeal behaviors likely to produce optimal voice and minimize exacerbation of pathology is equally helpful. Visual feedback provided by endoscopy can also be a valuable tool in educating patients regarding how the larynx works, for voice or other purposes, and in directing their efforts to modify behavior consistent with treatment objectives.
  2. Patient’s dysphonia is unexplained after non-phonoscopic laryngeal exam. As described earlier, subtle laryngeal pathology is sometimes visualized more readily in particular contexts or circumstances not typically included on standard laryngoscopic exams. If there continues to be a question about the cause of a patient’s dysphonia after this type of exam, the more task-oriented phonoscopic exam may prove quite useful.

If the referring otolaryngologist is aware that a speech-language pathologist who is working with a patient will be unable to perform an imaging study, the following information should be provided:

  1. Details of benign pathology (ie, site, compressibility, evidence of acuteness vs chronicity).
  2. Details of vocal “space” (ie, combination of frequency, intensity, airflow) where voice and laryngeal behavior appear improved.
  3. Details of behaviors that appear to be maladaptive (ie, use of false folds during voicing, arytenoid-epiglottis approximation).

With respect to the latter, it is important to know that “maladaptive” behaviors may not necessarily be hyperfunctional. The illustration presented earlier of vocal folds that were adducted but not vibrating is one example of a behavior that, though inappropriate for normal voicing, did not appear to be associated with excessive effort.

Image Limitations of Flexible Endoscopic Examination for Speech-Language Pathology

Traditional nasopharyngoscopes are composed of three components. These include a bundle of fiberoptic rods (typically around 3 to 3.5 mm in diameter) that transmit light to illuminate structures of interest, an external light source, and a video camera with lens attached to the viewing portion of the scope. The camera is attached to a monitor and, if desired, a recorder (video or digital) for storage of images. Recording the image through the fiberoptic bundle produces some pixelation of the image. In addition, the central portion of the image visualized appears larger than the image at the periphery. Newer versions of these scopes may accommodate an internal, battery-operated light source, making the entire system lighter and more portable.

Because of the scope’s insertion through the left or right nostril, it may not be possible to center the scope tip exactly above the vocal folds on imaging. The image viewed, depending on its distance from the scope tip, may appear disproportionately large or small as a consequence or may appear to move more or less briskly/extensively. Casper et al. provided an example of two lines drawn in parallel that appeared to intersect at one end when viewed through the flexible nasopharyngoscope.4 These limitations can typically be accounted for when they are understood and underscore the importance of experience with both normal and disordered anatomy and physiology in accurately interpreting the endoscopic exam.

Newer technology has produced flexible scopes that have a camera chip in the tip of the scope, minimizing several of the technical problems associated with older instruments. These videoscopes or “chip-in-the-tip” scopes produce high-resolution digital images of excellent quality. The diameter of these scopes may be somewhat larger than that of traditional scopes, though smaller versions are increasingly available. The powerful light sources and digital signal processing hardware associated with the videoscopes add weight to these systems, but the image resolution is excellent.

In our experience, stroboscopic imaging with a flexible endoscope, even a digital videoscope, does not typically provide the same quality of information regarding mucosal and vibratory displacements as stroboscopy with a rigid scope (in patients who can be visualized with rigid endoscopy). If a patient can be visualized well with rigid endoscopy, the large size and clear image possible are extremely useful to the voice clinician as well as the otolaryngologist.

Image Flexible Endoscopy and the Speech-Language Pathologist: Training, Safety, Ethics

The use of flexible nasal endoscopy is often considered a “minimally invasive” procedure. Primary risks include a reaction to a topical anesthetic, a nose bleed, or other nasal injury, and a vasovagal response in a patient. In some states, laws pertaining to the scope of practice for speech-language pathologists have specifically allowed for the procedure if certain training requirements and setting restrictions are met. In California, for example, a licensed speech-language pathologist must first be successfully mentored in the use of the instrument by an otolaryngologist certified by the American Board of Otolaryngology. Settings in which flexible endoscopy can be performed are also restricted, with a primary requirement being the availability of documented emergency medical backup procedures, including a physician or other appropriate medical professionals. There is additional language specifying that the pathologist’s use of an endoscope is not for the purpose of diagnosing pathology but, rather, for the elaboration and treatment of communication and swallowing impairment. It is incumbent on the voice clinician to know, and to be in compliance with, the pertinent state laws regarding the use of endoscopy by speech-language pathologists. These may differ from state to state and, unlike practice guidelines recommended by professional associations, are legally binding.


1. Leonard R, Dworkin J, Meleca R, Colton R, Leeper A, Till J. Assessment of the disordered voice: a roundtable discussion. J Med Speech-Lang Pathol 2002;10: 111–131

2. Leonard R, Kendall K. Effects of voice therapy on vocal process granuloma: a phonoscopic approach. Am J Otolaryngol 2005;26:101–107

3. Leonard R, Kendall K. Phonoscopy–a valuable tool for otolaryngologists and speech-language pathologists in the management of dysphonic patients. Laryngoscope 2001;111:1760–1766

4. Casper JK, Brewer DW, Colton RH. Variations in normal human laryngeal anatomy and physiology as viewed fiberscopically. J Voice 1987;1:180–185

Aug 15, 2016 | Posted by in OTOLARYNGOLOGY | Comments Off on Flexible Laryngoscopy in Speech-Language Pathology Evaluation
Premium Wordpress Themes by UFO Themes