29
Clinical Applications for High-Speed Laryngeal Imaging
The introduction of high-speed imaging of the larynx into clinical practice has expanded our ability to image vocal fold vibration to include situations that cannot be successfully evaluated using videostroboscopy. High-speed laryngeal imaging uses a high-speed camera to capture real-time images at a minimal rate of 2000 frames per second. This frequency of image capture is fast enough to obtain multiple images from a single cycle of vibration (usually 175 to 250 cycles per second) and when played back can be viewed as an actual slow-motion movie of vocal fold vibration. Because high-speed laryngeal imaging does not require the measurement of a stable vibratory frequency to adjust the rate of image acquisition, it can be used successfully in situations where the frequency of vibration is changing rapidly, such as during the onset and offset of voicing and in patients with aperiodic voices, tremor, or laryngeal spasms. Thus, high-speed digital imaging offers benefits over standard videostroboscopy in the diagnosis and analysis of patients with irregular vocal fold motion.
High-speed imaging of the larynx has been performed since the 1930s but remained impractical for clinical use because of cumbersome equipment and high costs. The recent development of advanced, cost-effective, and commercially available high-speed imaging systems has made it practical to use this technology in a clinical setting. It has the potential to advance the functional assessment of the pathophysiology of voice disorders, ultimately improving the ability to diagnose and manage vocal fold pathology.
Indications for High-Speed Laryngeal Imaging
Hoarseness, the clinical complaint usually investigated with laryngeal imaging, is the result of abnormalities in vocal fold vibratory function and is an indication for high-speed laryngeal imaging. Laryngeal pathology that results in hoarseness is commonly associated with an unstable frequency of vibration, leading to difficulties in using videostroboscopy for analysis of vibratory characteristics in these patients. High-speed laryngeal imaging can be used in patients in whom the strobe is unable to find a stable frequency. Characteristics of vibration usually assessed with videostroboscopy (amplitude of vibration, mucosal wave, periodicity, phase characteristics, glottic configuration, symmetry, and the presence of adynamic segments) can all be evaluated with high-speed laryngeal imaging.
Because the rate of image capture with high-speed laryngeal imaging is stable, the number of images captured per vibratory cycle depends on the frequency of vocal fold vibration. More images are captured per cycle at low vibration frequencies and fewer images are captured at higher frequencies of vibration. For example, at the low end of the fundamental frequency range, 150 Hz, between 13 and 14 images are captured per vibratory cycle when the capture rate is 2000 per second. At the higher end of the range, 250 Hz, only 8 images per cycle can be captured. In female singers, the rate of image capture may be insufficient to analyze vocal abnormalities that occur at the high end of their singing register.
Supportive Research
In a research setting, high-speed laryngeal imaging has been used to learn more about normal vocal fold vibratory behaviors. The digitization of the imaging enhances the ability to accurately quantify many characteristics of vibrating vocal folds not possible with the analog imaging of videostroboscopy. Tissue characteristics as well as the influence of other forces such as aerodynamics, muscle tension, and vocal fold length have been studied.1–3 High-speed laryngeal imaging has also allowed the evaluation of normal laryngeal functioning in situations of rapid pitch change, such as during the onset and offset of voicing and during a glissando (Video Clip 58). The technology can be adapted to make objective measures of normal vibration that correlate with acoustic measures.1 This type of analysis has advanced our understanding of normal vibratory physiology and is critical to the development of successful surgical and medical techniques used to restore normal vocal fold vibratory function to patients suffering from hoarseness.
High-speed laryngeal imaging analysis of the glottal area and the amplitude of vibration for each vocal fold has been applied to the evaluation of patients with unilateral vocal fold paralysis both before and after medialization.4,5 Another area where high-speed laryngeal imaging has great potential for clinical application is in the assessment of vocal tremor and in the differentiation of spasmodic dysphonia from muscle tension dysphonia (Video Clips 20 and 59). Due to the acoustic characteristics of tremor and breaks in phonation, videostroboscopy cannot capture the vibratory pattern in these patients (Video Clips 60 and 61). High-speed laryngeal imaging is being applied to assist with quantification of vocal tremor, which has been very difficult with videostroboscopy.6 As more information is gathered regarding normal function and function in patients with these disorders, better treatment modalities may be developed.
Further research of vocal fold vibration using high-speed laryngeal imaging is under way to evaluate how various dynamics of the vocal folds as seen with high-speed imaging can be interpreted and which features of vocal fold oscillations are important. The reader is referred to Chapter 28 for a more in-depth review of the science and current research pertaining to high-speed laryngeal imaging.
High-Speed Laryngeal Imaging Technique
Clinically Available Equipment
The Kay Pentax High-Speed Video System is available in combination with a Digital Stroboscopy System. Because both technologies are incorporated into a single workstation, it is easy to switch back and forth between the two imaging modalities during a patient evaluation. The rigid endoscope used for stroboscopy can be easily attached to the high-speed camera for high-speed laryngeal imaging.
Exam
High-speed digital imaging techniques use conventional rigid endoscopes to record images of the larynx with a full view of the superior laryngeal surface (Fig. 29.1). Due to the amount of data generated from that number of images, recording time is usually limited to ∼8 seconds. This, however, has been found to be a sufficient amount of time to evaluate most phonatory behaviors. The recorded images can be played back in slow motion for analysis. At a recording rate of 2000 fps, 8 to 20 images per vibratory cycle are recorded depending on the frequency of vibration. Keep in mind that videostroboscopy, on the other hand, is unable to record events that are shorter than four to five cycles in duration with four to eight cycles occurring per single image captured.
Patient positioning for high-speed imaging is the same as for videostroboscopy. The focal length of the high-speed camera is relatively narrow, and it is helpful to try to prefocus the camera using a template while positioning the scope end about 2 inches from the template. The camera is inserted into the pharynx transorally, in the same manner as is done for videostroboscopy, and the images can be viewed on a monitor. Using a foot pedal control, the examiner can save the last 8 seconds of recording for future playback and analysis. The playback speed can be adjusted as needed.
In general, the development of an examination protocol is recommended. This ensures that the final recording is optimal for a full evaluation of all vibratory parameters. Because the onset and offset of voicing is one parameter that can only be evaluated with high-speed laryngeal imaging, it should be included in the examination protocol. In developing the imaging protocol, it must be kept in mind that with high-speed laryngeal imaging, 2000 frames are recorded for each second of vibration and review of the images is done at a rate of 9 frames per second. Therefore, ∼2.5 minutes are needed to review 1 second of voicing, and up to 30 minutes is needed to review the entire 8 seconds of voicing recorded in the study. Most clinicians do not have 30 minutes to review each study and generally will look at much smaller segments of the study to make assessments.
An attempt should be made to record voicing at or near the patient’s fundamental frequency; however, this is not always an easy task. Often, to bring the larynx into view for recording, the patient is asked to voice at a frequency above the fundamental, effectively elevating the larynx for viewing. It is therefore important to establish the fundamental frequency prior to the onset of recording and then to attempt viewing of the larynx at that frequency. In general, a range of 20 Hz above or below the average fundamental frequency is accepted as representative of the fundamental frequency. Because some vibratory abnormalities occur only at certain frequencies and because vibratory characteristics are affected by the frequency of vibration, it is important that recording at a high pitch and then at a lower pitch also be performed. The remainder of the examination can be tailored to the patient’s specific complaints.
Study Interpretation
There remains a paucity of published normative data regarding vocal fold vibratory characteristics as documented by high-speed laryngeal imaging. As a result, our ability to identify pathologic vocal fold function using high-speed laryngeal imaging is limited to a degree by our inability to distinguish it from normal function. To establish the range of normal function as documented with high-speed laryngeal imaging, we performed high-speed recordings in 50 healthy individuals without voice problems and compared the high-speed recordings with videostroboscopy performed in the same subjects.7 Three blinded raters then reviewed the studies and judged the following characteristics of vibration: glottal configuration, vibratory symmetry, phase closure, mucosal wave propagation, amplitude of vibration, and periodicity of vibration. The data from high-speed studies was then compared with data from the videostroboscopic studies performed in the same subject population. Intrarater and interrater reliability was calculated for both imaging modalities.
Although each judge who participated in this study had extensive clinical experience with making clinical judgments of laryngeal imaging studies, there was a substantial amount of disagreement within and between judges. Previous studies of videostroboscopy judgments with high interrater agreement generally depend on the use of specific segments of the study to establish interrater agreement.5,8,9 When clinicians are free to make assessments from any part of the study, there may be differences in interpretation based on variables including loudness, pitch, effort level, and modal register of the subject.10,11 Recent studies regarding reliability of clinicians evaluating imaging from a large patient population report reliabilities similar to those found in this study.5,12
Because each clinician who participated in this study likely chose a slightly different part of the imaging study for making their judgments of vibratory characteristics, their findings varied. Recording made near the subject’s fundamental frequency was done to minimize this variability but could not eliminate it. This finding highlights the importance of communication between clinicians regarding the findings of laryngeal imaging and the importance of using the clinical picture to help interpret the results. There is also a need to continue the development of quantitative methods of measurement using imaging studies.11,13 Strict imaging protocols with respect to frequency, phonation mode, and loudness may help to minimize variability. However, any test that requires significant patient participation and cooperation will likely exhibit some variability based on patient factors that cannot be completely eliminated.
Overall, the comparison of videostroboscopy ratings with ratings from high-speed imaging studies did not reveal any statistically significant difference between the two modalities for any of the measures except for the assessment of periodicity. Aperiodic vibratory characteristics were noted in 26% of the videostroboscopy studies and in only 2.6% of the high-speed studies (p = 0.0006). Aperiodic vibrations were more easily identified with videostroboscopy because when there was aperiodic vocal fold vibration, the strobe failed to track, causing the images to jump from place to place in the vibratory cycle. In addition, changes in frequency of vibration may not be as easily identified with high-speed imaging because the observed difference from cycle to cycle may be subtle and requires the examiner to review many cycles to assess. Using the kymography function, aperiodic vibrations are easily identified with high-speed imaging. The kymograph displays the movement of the vocal fold edges at a point along the anterior-posterior axis of the vocal folds. The point can be chosen by the examiner. The computer then creates a vertical display of the movement at that point over time. Several cycles of vibration can be seen simultaneously, and changes in vibration frequency can be easily identified. The kymograph function is also helpful with the evaluation of symmetry of vibration between the two vocal folds.14,15 Segments of aperiodic vibration in 26% of a normal population indicates that previous estimates of the percentage of patients likely to benefit from high-speed laryngeal imaging may be low.
In addition to high rates of aperiodicity, asymmetry of vibration was found in 25% of the population, regardless of imaging modality. Periods of vibratory aperiodicity and periods of asymmetry seen in our normal subjects indicate that these findings may not represent a vocal fold vibratory abnormality when documented in patients.
When the normal range of glottal configuration is considered, significant variability was found within this normal subject population (Table 29.1). Every type of glottal configuration was identified although the closed configuration predominated, being found in 52% of the subjects studied. The next most common configuration was a posterior glottic gap, identified in 31 % of the study population. Previous studies of glottal configuration in normal females have found a posterior glottic opening in 30% while voicing at the subject fundamental frequency.8,16 In addition, a posterior glottic opening was seen more frequently in younger females than in older females in our study group, and this finding is also consistent with the results of other studies.17 We also found more variability in the ratings of glottic configuration in the females when compared with the males in the study.
Differences between imaging modalities and between raters for glottic configuration indicate variability of the parameter in a given subject. For example, one subject was rated as having incomplete closure on the high-speed imaging study by both speech-language pathology raters. The otolaryngology rater evaluated the high-speed study twice, but rated the glottic configuration as closed on both of those assessments. The otolaryngologist also rated the videostroboscopy study in this individual and rated the glottic configuration as demonstrating an anterior glottic opening. Re-review of the studies revealed that the videostroboscopy clip was slightly over 2 minutes long. There was evidence of anterior-posterior and lateral-medial hyperfunction throughout the study. An anterior gap was identified for most of the study, and this was confirmed by one of the other raters. The high-speed study was variable and could have been judged as complete closure, anterior gap, or incomplete closure depending on where the judgments were made. There were 11,196 frames in the clip. The study starts with an anterior gap and then changes to complete closure, and this is inconsistent. The subject then takes a breath and starts voicing again with the anterior gap. Five hundred eighty-nine frames later, there is complete closure (with much anterior-posterior squeeze, so there may have been a gap that was not visible). Then, 522 frames later, there is definitely an anterior gap. And 119 frames later, the closure is incomplete for 2804 frames, and then goes back to an anterior gap. Once again, 483 frames later, the glottic configuration changes to complete closure, and the clip then goes back to an anterior gap until done. This type of variability within a single examination underscores the caution clinicians must use during examination interpretation.
The amplitude of vocal fold vibration and mucosal wave propagation was rated for both imaging modalities on a 100-point scale with 41 designated as “normal” for both parameters. The majority of the ratings for both mucosal wave and amplitude fell within 5 points of “normal,” irrespective of image modality, and there was no significant difference in the ratings based on image modality (Figs. 29.2 and 29.3). As expected, the ratings of vibration amplitude and mucosal wave were clustered around the designated “normal,” however, the range of ratings was skewed toward an increased amplitude of vibration with some of the “normal subjects” demonstrating vocal fold vibration amplitudes rated to be as high as 28 points above the “normal” mark. In other words, diminished amplitude of vibration and mucosal wave likely represent pathology as the normal population is skewed toward larger values for these two parameters.
The distribution of the percent open phase ratings in our normal subject population demonstrated two distinct patterns when comparing the two image modalities (Fig. 29.4). The videostroboscopy ratings were based on the “montage” function of the strobe system that selects 10 images from a single virtual composite “cycle” of vibration and displays them on the screen. The rater then counts the number of images with open vocal folds. This method results in estimates of the percent open phase that are multiples of 10. Consequently, the percent open phase data from videostroboscopy falls into distinct groups. Almost 7% of the studies were rated to have a 40% open phase, 25% of the studies were rated to have a 50% open phase, 16% of the studies were rated to have a 60% open phase, and 17% were rated to have a 70% open phase. Another 23% were given ratings that fell in between these integers and are the result of the rater sampling from several points in the study and averaging the percent open phase from several montages. The other 13% were given open phase ratings above or below these values. On the other hand, because high-speed imaging of the larynx has a fixed rate of image capture, the number of images per cycle varies. The measurement of the number of frames with open vocal folds per cycle results in data of a more continuous nature due to the potential smaller increments of measurement. The mean value of the percent open phase as measured from high-speed imaging was 62.3% and ranged from 44 to 100%.
Report Creation
Our patient evaluation begins with videostroboscopy and is followed with high-speed imaging. We currently include both the highspeed and videostroboscopy findings in our final report. At present, there is no mechanism to bill separately for the high-speed portion of the study, so it is integrated into our overall “laryngeal imaging” procedure. This is not difficult to do because the addition of high-speed imaging to the protocol simply requires a change of camera after the videostroboscopy is complete and a few more seconds of recording. Certainly, not every patient needs further evaluation with high-speed imaging after videostroboscopy. However, at this time, we perform high-speed imaging on every patient undergoing laryngeal imaging and have been gratified by the additional information that is gained, even in patients in whom the videostroboscopy exam was optimal for evaluation.
High-Speed Laryngeal Imaging Limitations
Currently, high-speed laryngeal imaging remains complementary to videostroboscopy in a clinical setting. Because only a relatively few vibratory cycles are reviewed with a highspeed imaging study, intermittent pathology may be missed. Using the technology in concert with videostroboscopy, however, provides a more complete assessment of vibratory function over a longer recording period. In addition, it provides assessment of vibratory function in real time and allows imaging during voicing onset and offset. With the kymography function, the high-speed imaging system can be used to assess changes in periodicity and symmetry over time and help with further analysis of intermittent vibratory irregularities, provided that they occur during the 8 seconds recorded during high-speed imaging.
The images recorded with older high-speed imaging systems are in black and white. If an assessment of tissue color is desired with these systems, then videostroboscopy is required. We have found that it has been helpful at times, however, to visualize pathology of the vocal folds in the black-and-white allowing a better assessment of the limits of the pathology that was obscured by color changes on videostroboscopy. In particular, cases of vocal cyst viewed with high-speed imaging are easily distinguished from polypoid changes, which may be difficult to differentiate with videostroboscopy.
It must be kept in mind that the stable frequency of image capture with high-speed imaging means that fewer images per cycle are recorded at high frequencies. Frequencies greater than 500 Hz will be difficult to assess with high-speed imaging because less than four images per cycle will be recorded. A soprano complaining of difficulty with the high end of her singing range is an example of this situation.
Clinical Application Examples of High-Speed Imaging
We have found several instances where additional clinical information was gained with high-speed imaging of the larynx in patients with good tracking on videostroboscopy. These patients might not initially be considered for high-speed imaging because the videostroboscopy appeared easy to interpret. However, when the high-speed imaging study was reviewed, unexpected findings were appreciated.
- Early postoperative recovery: Previous studies using videostroboscopy to assess the return of the mucosal wave in patients after surgery of the vocal fold report that the mucosal wave generally begins to return 3 weeks after surgery. With high-speed imaging, we have found cases of mucosal wave recovery as early as 1 week after surgery. Videostroboscopy performed at that time revealed stiffness in the area of surgery, likely due to poor tracking in the healing vocal fold segment because it vibrated at a different frequency from the rest of the normal vocal fold (Video Clip 62). On high-speed imaging, however, clear vibrations were identified in the healing segment (Video Clip 63). Based on this finding, we have been able to allow these patients to advance more rapidly with voice rehabilitation.
- Behavior assessment: Although no further information regarding the type and degree of pathology is obtained in patients with vocal fold nodules on high-speed imaging, significant information regarding behavioral pathology can be recorded. By recording voicing onset, we have found several cases of what appears to be extreme muscle tension and supraglottic squeezing at voicing onset that was not appreciated with videostroboscopy (Video Clips 36 and 64). In addition, although no objective measures can be made, subjectively there appears to be an increase in the force with which the vocal folds contact during voicing. When viewed with high-speed imaging, the vocal folds of these patients appear to be slamming together (Video Clips 65 and 66). These observations may provide clues into the pathophysiology of nodule formation. Following these patients with serial high-speed imaging studies may provide a mechanism by which to monitor the progress of therapy. Eventually, the technology could be used to assess patients prior to surgery to determine if they have successfully eliminated the muscle tension behavior, thereby increasing the likelihood of successful outcomes with surgery.
- Nodule/polyp versus cyst: Occasionally, it is difficult to distinguish various types of vocal fold pathology along the vocal fold margin. In particular, it is important to determine if a cyst is present. This type of lesion does not respond to voice therapy and requires surgical excision for cure. Often, however, there is significant surrounding secondary pathology such as edema and erythema that obscure the margins of the cyst, making it difficult to distinguish from a nodule or polyp. We have found that by viewing this type of pathology with high-speed imaging in black-and-white, the margins of the lesions appear more distinct and that the diagnosis of a cyst can be made more definitively. As a result, the patient is spared prolonged voice therapy prior to surgical treatment.
- Aperiodicity versus stiffness: Using videostroboscopy, a part of a vocal fold may appear adynamic when, in actuality, it is vibrating but at a frequency that is irregular and/or different from the measured frequency of voicing. Because the strobe is not tracking the frequency of vibration of that part of the vocal fold, it appears stiff. With high-speed imaging, however, it is possible to see the vibrations in that segment. (See the previous discussion regarding early postoperative recovery.)
- Nodule/polyp versus cyst: Occasionally, it is difficult to distinguish various types of vocal fold pathology along the vocal fold margin. In particular, it is important to determine if a cyst is present. This type of lesion does not respond to voice therapy and requires surgical excision for cure. Often, however, there is significant surrounding secondary pathology such as edema and erythema that obscure the margins of the cyst, making it difficult to distinguish from a nodule or polyp. We have found that by viewing this type of pathology with high-speed imaging in black-and-white, the margins of the lesions appear more distinct and that the diagnosis of a cyst can be made more definitively. As a result, the patient is spared prolonged voice therapy prior to surgical treatment.
Future Developments in High-Speed Laryngeal Imaging
The continued use of high-speed laryngeal imaging will likely elucidate more clinical situations in which it is helpful, and this will encourage manufacturers to refine the technology. The addition of color to high-speed imaging has already occurred with most of the commercially available systems. Linking the images to an audio recording of the patient’s voice and further refinements in image resolution would provide additional clinical information to the study. The digitization of the imaging enhances the ability to accurately quantify many characteristics of vibrating vocal folds, and future systems are likely to offer the ability to quantitatively assess vocal fold vibration.18,19
In conclusion, high-speed imaging of the larynx offers many benefits over standard videostroboscopy in the analysis of patients with voice disorders, particularly those with irregular vocal fold motion. As the technology advances and the knowledge base with this form of laryngeal analysis expands, our ability to evaluate and treat patients will improve.
References
1. Sundberg J. Vocal fold vibration patterns and modes of phonation. Folia Phoniatr Logop 1995;47:218–228
2. Berry DA, Montequin DW, Tayama N. High-speed digital imaging of the medial surface of the vocal folds. J Acoust Soc Am 2001;110(5 Pt 1):2539–2547
3. Döllinger M, Braunschweig T, Lohscheller J, Eysholdt U, Hoppe U. Normal voice production: computation of driving parameters from endoscopic digital high speed images. Methods Inf Med 2003;42:271–276
4. Hertegård S, Larsson H, Wittenberg T. High-speed imaging: applications and development. Logoped Phoniatr Vocol 2003;28:133–139
5. Verdonck-de Leeuw IM, Festen JM, Mahieu HF. Deviant vocal fold vibration as observed during videokymography: the effect on voice quality. J Voice 2001;15:313–322
6. Larsson H, Hertegård S, Lindestad PA, Hammarberg B. Vocal fold vibrations: high-speed imaging, kymography, and acoustic analysis: a preliminary report. Laryngoscope 2000;110:2117–2122
7. Kendall KA. High-speed laryngeal imaging compared with videostroboscopy in healthy subjects. Arch Otolaryngol Head Neck Surg 2009;135:274–281
8. Gelfer MP, Bultemeyer DK. Evaluation of vocal fold vibratory patterns in normal voices. J Voice 1990;4:335–345
9. Yan Y, Ahmad K, Kunduk M, Bless D. Analysis of vocal-fold vibrations from high-speed laryngeal images using a Hilbert transform-based methodology. J Voice 2005;19:161–175
10. Sulter AM, Schutte HK, Miller DG. Standardized laryngeal videostroboscopic rating: differences between untrained and trained male and female subjects, and effects of varying sound intensity, fundamental frequency, and age. J Voice 1996;10:175–189
11. Woo P. Quantification of videostrobolaryngoscopic findings: measurements of the normal glottal cycle. Laryngoscope 1996; 106(3 Pt 2, Suppl 79)1–27
12. Poburka BJ. A new stroboscopy rating form. J Voice 1999;13:403–413
13. Svec JG, Schutte HK. Videokymography: high-speed line scanning of vocal fold vibration. J Voice 1996;10:201–205
14. Wittenberg T, Tigges M, Mergell P, Eysholdt U. Functional imaging of vocal fold vibration: digital multi-slice high-speed kymography. J Voice 2000;14:422–442
15. Tigges M, Wittenberg T, Mergell P, Eysholdt U. Imaging of vocal fold vibration by digital multi-plane kymography. Comput Med Imaging Graph 1999;23:323–330
16. Pemberton C, Russell A, Priestley J, Havas T, Hooper J, Clark P. Characteristics of normal larynges under flexible fiberscopic and stroboscopic examination: an Australian perspective. J Voice 1993;7:382–389
17. Linville SE. Glottal gap configurations in two age groups of women. J Speech Hear Res 1992;35:1209–1215
18. Schuberth S, Hoppe U, Döllinger M, Lohscheller J, Eysholdt U. High-precision measurement of the vocal fold length and vibratory amplitudes. Laryngoscope 2002;112:1043–1049
19. Qiu Q, Schutte HK, Gu L, Yu Q. An automatic method to quantify the vibration properties of human vocal folds via videokymography. Folia Phoniatr Logop 2003;55:128–136