The Science of Stroboscopic Imaging
During phonation, the vocal folds usually open and close over 100 times per second and vibrate at velocities approaching 1 meter per second, making it impossible to view this activity with the unaided eye.1 Stroboscopy has become an essential component of the clinical voice evaluation because this approach enables the examiner to obtain a visual estimate of vocal fold vibratory function that can be recorded for later playback. This chapter explains the science that underlies stroboscopic imaging of vocal fold vibration and gives a description of the clinical systems that are currently used to perform stroboscopic examinations in the voice clinic.
Overview of Stroboscopic Examination of Vocal Fold Vibration
In current practice, clinical stroboscopic examination involves using a video camera attached to a rigid (transoral) or flexible (transnasal) endoscope to observe (using a video monitor) and record images of the vocal folds. In the most common approach, illumination is provided by a strobe light that flashes at a rate that is synchronized with the patient’s vocal fundamental frequency during sustained vowel production to produce what appears to be a slow-motion view of vocal fold vibration.2
Figure 11.1 shows the kind of simple schematic that is typically used to illustrate how stroboscopic examination of vocal fold vibration is accomplished.3,4 The oscillating waveform along the top of Fig. 11.1 represents the true pattern of glottal opening and closing that takes place as the vocal folds vibrate. The light bulbs indicate the instants when the flashes of the strobe light occur—at a slower rate than the vocal fold vibration frequency—timed to illuminate successive points (phases) in the repeating pattern of vibration from different vibratory cycles as time progresses. The lower waveform in Fig. 11.1 represents the perceived composite stroboscopic (slow-motion) sequence, which is constructed by sequential presentation of the sampled images (represented by the arrows projecting from the upper waveform). This entire stroboscopic process is dependent on an adequately stable fundamental frequency, and the resulting sampled images form an averaged, down-sampled estimate of the true underlying tissue motion.
Origins of Stroboscopy
In the early 1800s, it was discovered that the visual perception of motion could be evoked from viewing a discrete set of images.5 Exploiting this phenomenon, several household toys of the day—the phenakistiscope, zoetrope, daedaleum, and zoopraxiscope—created striking visual illusions of motion by presenting successive images of a sampled event in rapid succession.6 Regarded as the first application of stroboscopic principles, the phenakistiscope was invented by Plateau and independently conceived as Stampfer’s stroboscope and Roget’s phantasmascope. This device could make still images of objects, such as animals and dancers in different positions, appear to move by viewing the pictures through slits on a revolving disk. Motion was perceived because the periodic interruption of the viewer’s line of sight occurred rapidly enough to produce a sequential sampling (strobing) of the individual pictures.
In the late 19th century, Oertel published the earliest application of stroboscopic principles for observing vocal fold vibrations.7–9 Oertel used a revolving disk with equally spaced holes to mechanically shutter (strobe) a light source that was reflected by a laryngeal mirror to illuminate and observe the vocal folds. Provided subjects were able to adequately match their vocal pitches to the frequency of the rotating disk, the periodic flashes of the light produced a sequence of images that was perceived as a slow-motion representation of the vocal fold vibratory pattern.
Visual Perceptions of Apparent Motion
Humans are able to perceive motion, even if no real motion exists, from the presentation of successive images if some temporal requirements of the human visual processing system are met. Wertheimer is credited with conducting seminal experiments that yielded temporal parameters necessary for the perception of apparent motion,10–12 with refinements made by later investigators on more complex stimuli.13 Wertheimer’s experiments involved the successive presentation of two geometric figures separated by varying time intervals.14 Although the specific time intervals necessary for evoking apparent motion varied depending on experimental conditions, general numeric boundaries were reported by Wertheimer based on his empirical data.
At presentation intervals shorter than 30 msec, the two figures were perceived to exist simultaneously. At intervals above 200 msec, the two figures were perceived to appear in succession. Two hundred msec, or 0.2 seconds, is often cited as the time an image persists on the retina after presentation, which has been erroneously linked to motion perception in much of the previous literature that has tried to explain how stroboscopy works for observing vocal fold vibration.7,15,16 At intermediate interval durations, optimally around 60 msec, a single figure in motion was perceived. The 60-msec interval corresponds with a presentation frequency of ∼17 images per second. These results define the optimal frequency (17 Hz) at which a sequence of discrete images would be perceived to exhibit apparent, continuous motion. They also formed the basis for the frame rates at which motion pictures are filmed and played back, which has ended up at 25, 25, and 30 frames per second, respectively, for the phase alternating line (PAL), séquential couleur à mémoire (SECAM), and National Television System Committee (NTSC) standard analog video protocols.17,18
Common misconceptions that have been propagated in the laryngology literature involve the intertwining of Talbot’s law with notions about the persistence of vision in attempting to explain how stroboscopy facilitates the examination of vocal fold vibration.4,7,16,19,20 In actuality, Talbot’s law states that “if a luminary of a certain brightness is exposed intermittently, the regular intermittences being too frequent for the eye to perceive, the resultant brightness is to the actual brightness as the time of exposure to the total time of observation.”12 In his experiments on estimating the intensity of light using time-based measurements, William Henry Fox Talbot rapidly rotated a white disk with a single black sector and noted that the perceived “obscuration” (Talbot’s term for grayness) of the rotating disk was “proportional to the angle of the [black] sector.”21 Any point on the disk was intermittently illuminated (white) at periodic intervals, and the duration of illumination—the exposure time—was related to the grayness of the disk. This relationship became known as Talbot’s law. Plateau further determined that this relationship was absolute (the Talbot-Plateau law); that is, the ratio between the apparent brightness of a rapidly rotating black-and-white disk and the brightness of an all-white disk is not only proportional to, but equal to, the ratio of the exposure time to the total time.12,22 Talbot is mainly known for his seminal work in developing photographic chemical processes, particularly the calotype process that provided for the generation of multiple positive prints from a single negative.23
Subsequent experiments by others using painted disks and intermittently interrupted light sources have established that, under usual circumstances, the rate of strobed illumination (strobe frequency) should be greater than about 50 Hz to be perceived as flicker free (ie, having no perceived variation in light intensity or object illumination).24–29 This frequency requirement is satisfied in laryngeal stroboscopy with the unaided eye, which employs strobe frequencies that are based on human fundamental frequencies well above 50 Hz. It follows that the visual perception of apparent motion (at frequencies above 17 Hz) is already created once the flicker-free threshold of 50 Hz is achieved. In addition, the idea that persistence of vision induces apparent motion in sequentially presented images has been shown to be a logical fallacy.12,30
Principles of Stroboscopic Sampling
Once the requirements of the visual system are met to induce apparent motion, the task turns to selecting the images that will be displayed to create this motion. Stroboscopic sampling can be used to create the optical illusion of slowing down and better revealing (or even freezing) an underlying pattern of rapid motion, such as the vocal fold vibratory pattern. Figure 11.2 illustrates a simple model of periodic motion to mimic the repeating vibratory pattern of the vocal folds. A circle is periodically rotated around its center and completes one revolution (360 degrees) in a given duration of time. For example, setting the fundamental frequency of rotation to 100 revolutions, or periods, per second would make each of the cycles (periods) 10 msec in duration. Each row in Fig. 11.2 (from left to right) depicts one complete cycle of rotation, so with 10 rows, a total of 10 complete revolutions are displayed. Each column shows samples of the rotating circle taken at equal intervals of time, or phases, within each cycle. In this case, the samples are taken at 36-degree intervals, and each sampling interval is 1 msec in duration.
Stroboscopy can enable two views of periodic motion—it can appear to freeze the motion at a selected point (phase) in the repeating pattern or it can create an apparent slow-motion view of the repeating pattern (ie, display the entire period or cycle). The first view, freezing of the motion, is accomplished by matching the frequency of the strobe imaging to the fundamental frequency (repetition rate) of the motion. In Fig. 11.2, this principle is demonstrated by the blue squares sampling the circular motion once per revolution at the phase of 288 degrees. Freezing the motion at other phases can be accomplished by adding time delays to this stroboscopic sampling. For example, a delay of 1 msec would shift the sampling later in time by 36 degrees, resulting in a freezing of the motion at the 324-degree phase ([1 msec]/[10 msec] × 360 degrees = 36 degrees, and 36 degrees + 288 degrees = 324 degrees).
The second stroboscopic view creates an apparent slow-motion presentation of the underlying periodic movement by sampling successive phases of the movement across repeated cycles (the strobe effect). The method can be related to the beating that occurs between two auditory tones that are close in frequency, where the beat frequency is equal to the absolute difference between the frequencies of the two tones. By analogy, the number of cycles of slow-motion movement produced per second by the strobe effect is equal to the difference (beat frequency) between the fundamental frequency of the movement and the frequency at which the strobe images are being acquired (strobe frequency).31 This phenomenon is illustrated by the red squares in Fig. 11.2, which shows a sequence of individual images being taken at successively later phases in each of the 10 repeated cycles of rotation.
Assuming a rotation period of 10 msec (ie, each cycle of vibration takes 10 msec to complete, producing a fundamental frequency of 100 Hz), the strobe effect requires sampling every 11 msec (ie, a 1-msec delay relative to the period) to progressively advance to the phase at which each subsequent image is taken. The strobe frequency must be lower (having a longer period) than the rotation frequency to sample successive phases of the cycle. If the strobe frequency were higher than the rotation frequency, the circle would appear to be rotating in reverse, a usually undesirable effect referred to as time aliasing. An 11-msec period corresponds with a strobe frequency of about 91 Hz (the reciprocal of 11 msec). The absolute difference between the strobe frequency (91 Hz) and the rotation frequency of the circles (100 Hz) is about 9 Hz, which is the beat frequency, or number of composite cycles actually presented per second. Thus, the actual rotation speed of 100 cycles per second would appear as 9 cycles per second during stroboscopy. The effective true sampling rate for this example, which would allow within-period sampling, would be 1000 Hz (calculated from 10 samples per period multiplied by the rotation frequency of 100 Hz).
Stroboscopic Examination of Vocal Fold Vibration
Here it must be noted that there are two fundamentally different methods of stroboscopic examination of vocal fold vibration: real-time viewing by direct observation and real-time viewing and recording by video-based technologies. Up to this point, the principles of stroboscopy have assumed that the rapid motion of an object was to be viewed via direct observation with the eye. Consequently, as long as the strobe frequency is above 50 Hz, the conditions necessary for a flicker-free sequence of images and the perception of continuous, apparent motion will be satisfied. As already described, the earliest use of stroboscopy for direct observation of vocal fold vibration was accomplished using a laryngeal mirror and a shuttered light source.9 As current clinical systems use video-based technology, it is necessary to have some understanding of how these systems integrate the video capture process with stroboscopic imaging.2,32,33
The video recording process often follows the NTSC standard that sets the capture rate at ∼30 interlaced frames per second, with each frame comprising two fields that are captured at distinct times (actual frame and field rates are 30/1.001 Hz and 60/1.001 Hz, respectively).17 Alternatively, the video capture process might follow one of two other international standards—PAL and SECAM—that also employ 2:1 interlacing but set the video frame and field rates at 25 Hz and 50 Hz, respectively.18 A field consists of the set of either even-numbered or odd-numbered horizontal lines that are used to capture or encode the image. Thus, in the NTSC standard, every 1/60th of a second, half of the horizontal lines that comprise the image are captured and displayed on the monitor, and, in the subsequent 1/60th of a second, the other half of the horizontal lines that comprise the image are recorded and displayed. Therefore, a full image, or frame, is created every 1/30th of a second from the simultaneous display of two adjacent fields.
As with most digital camera technologies, a scene is captured by optically focusing light onto the image sensor of the camera, which then averages the scene information over the exposure time to create a static image. In the NTSC standard, the exposure time is maximized so that the camera sensor averages the scene information over the entire field duration (1/60th of a second). Any movement that occurs during the exposure time results in motion blur in the resulting field. To obtain nonblurred video recordings of stroboscopic images, the strobe light must be controlled so that it only flashes once (produces one strobed image) during the exposure time in each field. The video camera captures and plays back half the illuminated image every 1/60th of a second. The flicker due to the alternating pattern of horizontal lines is generally imperceptible because the field rate (60 Hz) exceeds the critical flicker frequency of 50 Hz. In addition, the perception of apparent motion is always achieved because the video rate (30 frames per second) is in the range at which discrete images are perceived as a continuous, moving sequence.
Thus, in a given system, the field and frame rates are fixed according to the NTSC, PAL, or SECAM protocol. The rate of the strobe light, however, can be modified and controlled independently from the video capture rate. Early videostroboscopy systems by Atmos and by Brüel and Kjær set the strobe rate to a specified beat frequency to illuminate the vocal folds once per cycle.33 A direct consequence of these strobe rates was that more than one strobe flash could occur during each video frame. For example, with the NTSC video field rate at ∼60 Hz and a vocal fold vibratory frequency at 130 Hz, there could be two or three strobe flashes (once per vocal fold cycle) occurring during each video field. The captured vocal fold image within each video field would exhibit multiple-exposure artifacts, such as multiple edges and inconsistent field-to-field illumination, because the camera sensor would integrate spatial information across two or three illuminated images. Additional artifacts would be evident when viewing a composite (still) video frame combining two of the already degraded video field images. The degree of image degradation was directly linked to the fundamental frequency of vibration; that is, stroboscopic recordings of humans with higher fundamental frequencies suffered from increased artifacts.
Current Clinical Stroboscopy Systems
To counteract the undesirable image degradation exhibited by earlier models, in 1992, Kay Elemetrics (now KayPENTAX) introduced the first laryngeal stroboscopy system that controlled the triggering of the strobe light so that it only flashed once per video field, thereby eliminating artifacts due to multiple exposures during each field. Modern clinical stroboscopy systems automatically derive an estimate of the patient’s voice signal during phonation with a neck sensor (usually a contact microphone or electroglottograph) and use this signal as the basis for controlling the timings of the strobe light to produce high-quality stroboscopic images. Thus, the stability of the patient’s voice is critical for generating an accurate slow-motion view of vocal function.
In current systems, the repetition rate of the slow-motion vocal fold vibratory pattern can be modified from 0.5 to 2 cycles per second.2,32 For example, with the slow-motion rate set to 1.5 cycles per second (the fast mode in the KayPENTAX system) and the (NTSC) video rate fixed at ∼60 fields per second, it would take 40 fields (60/1.5), or 20 video frames, to capture one complete cycle of vibration. To emphasize, the number of images captured to represent one vocal fold cycle would be the same, regardless of the fundamental frequency of vocal fold vibration. The number of video fields per cycle defines the phase interval, in degrees, between successive strobe flashes in a complete 360-degree cycle. Thus for 40 fields per cycle, the phase interval would be 9 degrees (360 degrees/40). Consequently, the phase interval between each video frame would be 18 degrees. The voice signal from the neck sensor is used to trigger the flash of the strobe light at the appropriate phase interval for each successive video field. Once the strobe fires during a video field, the system waits for the next video field to start before triggering the strobe at the next desired phase in the voice signal. Some clinical stroboscopic examination systems provide a foot pedal for switching the recording/playback mode between slow-motion and freeze-frame modes of operation.2 The freeze-frame mode is accomplished by setting the slow-motion rate to zero, which sets the phase interval parameter to 0 degrees. Instead of capturing successive phases within a vocal fold cycle, the same phase is obtained across vocal fold cycles for each video field.
There are two main types of videostroboscopic systems that differ in their methods used to capture stroboscopic images. The most common method is the use of a flashing strobe light as the illumination source, as in the KayPENTAX (Lincoln Park, NJ) Rhino-Laryngeal Stroboscope.2 An alternative method makes use of a constant light source and performs stroboscopic sampling by electronic shuttering of the camera. An example of this type of system is the JEDMED (St Louis, MO) StroboCAM II.32
Modern clinical stroboscopic examination systems make use of some common components, including endoscopes and systems for image display and recording that are based on standard video protocols (as described earlier), in addition to specialized light sources and camera control technology.2,32 Stroboscopy can be performed using any type of endoscope that transmits sufficient light to the camera sensor during the strobe process. For some time, this has been most easily accomplished with the standard transoral rigid endoscope (telescope) and, more recently, has been made possible with flexible transnasal videoscopes (containing a miniature-chip camera in the scope’s tip) that attach to both the strobe light source and the camera to allow for illumination and imaging of vocal fold motion. Stroboscopy is also possible through older, flexible fiberoptic transnasal endoscopes, but the poorer image quality due to insufficient lighting compromises the utility of the strobe exam. The continued reliance on video-based technologies has resulted in the creation of the term videostroboscopy to describe clinical systems and the generated examinations. (Additional terms have been strung together to produce more expansive labels like laryngovideostroboscopy.) These days, most display and recording systems are essentially digitally based video systems.
Example of Stroboscopic Sampling Using High-Speed Videoendoscopy as a Reference
This section uses reference images from highspeed videoendoscopy to provide a final comprehensive illustration of the principles that underlie the clinical use of stroboscopy to examine vocal fold vibratory function. Laryngeal high-speed videoendoscopy, with color video capture rates up to 10,000 frames per second, provides a much more accurate sampling of vocal fold tissue motion than that of stroboscopic imaging.3 In addition, the digital cameras used for high-speed imaging capture the entire scene in each frame, in contrast with the video-based interlacing scheme inherent in clinical videostroboscopy.
Figure 11.4 shows the high-speed video data that were obtained from a subject without vocal pathology at a rate of 6250 frames per second (0.16 msec per frame) using state-of-the-art digital color camera technology.3 Two transoral rigid endoscopes were simultaneously positioned to view the vocal folds while the subject produced a sustained vowel that approximated the /ae/ vowel. One endoscope provided continuous illumination for capturing video images from the attached high-speed camera. The second endoscope was connected to the light source of a clinical stroboscopy system that delivered strobe flashes that were triggered off of the voice signal obtained from a neck-mounted contact microphone.2 The fundamental frequency during the phonatory segment was estimated to be ∼236 Hz (period = 4.23 msec), yielding 26.44 highspeed video frames per vocal fold vibratory cycle ([4.23 msec per period]/[0.16 msec per frame] = 26.44 frames per cycle).
A total of 477 frames (76.32 msec) from the high-speed recording of vowel phonation is shown in Figure 11.4. Each row consists of one period of vocal fold vibration, which mimics the organization of the schematic in Figure 11.2 that was used to demonstrate stroboscopic sampling. Note that Figure 11.4 compensates for a non-integer number of frames per period by alternating between displaying 26 frames in odd-numbered rows and 27 frames in even-numbered rows. The solid lines (yellow and orange) demarcate the number of high-speed images captured during the duration of NTSC video fields. Because one light source was continuously on, the brighter images in the figure indicate times when the stroboscopic light source was triggered to flash. The brighter images are meant to mimic the images that would have been captured with only a videostroboscopic system. This type of display gives a clear sense of how stroboscopy provides an undersampled and averaged estimate of the underlying temporal details in vocal fold tissue motion during phonation. Note that, as expected, the strobe light is triggered to flash only once during each video field and that three cycles are skipped between flashes.
In this example, the slow-motion rate of the strobe light was set to 1.5 Hz, resulting in the display of 1.5 periods of vibration per second (a common setting used on clinical strobe units). The strobe rate must be close to 60 Hz to provide exposure (exactly one strobe flash) for each video field. The strobe rate can be estimated by simply dividing the fundamental frequency (236 Hz) by the video field rate, in this example 236/60 ≈ 3.9, indicating that the strobe would flash approximately once every 4 cycles or periods of vocal fold vibration.
An illuminating account of the history of studies on flicker, apparent motion, and the propagation of the fallacy of persistence of vision can be found in an article by Galifret, “Visual Persistence and Cinema?”12 Clear explanations of what happens when the strobe does not accurately track the phonatory pitch can be found in a chapter by Cranen and de Jong entitled “Laryngostroboscopy” in the book Voice Quality Measurement.31
Stroboscopic recordings are an effective tool for assessing vocal fold vibratory patterns, facilitating real-time video and audio playback. Video clips on the DVD accompanying this book display stroboscopic video sequences created by the KayPENTAX Rhino-Laryngeal Stroboscope2 and the Vision Research Phantom v7.3 color high-speed camera3 (Video Clips 10 to 14).
1. Schuster M, Lohscheller J, Kummer P, Eysholdt U, Hoppe U. Laser projection in high-speed glottography for high-precision measurements of laryngeal dimensions and dynamics. Eur Arch Otorhinolaryngol 2005; 262:477–481
2. KayPENTAX. Instruction Manual: Stroboscopy Systems and Components. KayPENTAX, St Louis, MO; 2008
3. Deliyski DD, Petrushev PP, Bonilha HS, Gerlach TT, Martin-Harris B, Hillman RE. Clinical implementation of laryngeal high-speed videoendoscopy: challenges and evolution. Folia Phoniatr Logop 2008;60:33–44
5. Roget PM. Explanation of an optical deception in the appearance of the spokes of a wheel seen through vertical apertures. Philos Trans R Soc Lond 1825; 115:131–140
6. Wade NJ. Philosophical instruments and toys: optical devices extending the art of seeing. J Hist Neurosci 2004;13:102–124
7. Wendler J. Stroboscopy. J Voice 1992;6:149–154
8. Zeitels SM. Premalignant epithelium and microinvasive cancer of the vocal fold: the evolution of phonomicrosurgical management. Laryngoscope 1995; 105:1–51
9. Oertel M. Das Laryngo-stroboskop und die laryngostroboskopische Untersuchung. Arch Laryng Rhinol. 1895;3:1–16
10. Wertheimer M. Experimentelle Studien über das Sehen von Bewegung. Z Psychol Z Angew Psychol 1912;61:161–265
11. Sekuler R. Motion perception: a modern view of Wertheimer’s 1912 monograph. Perception 1996;25:1243–1258
12. Galifret Y. Visual persistence and cinema? C R Biol 2006;329:369–385
13. Burr DC, Ross J, Morrone MC. Smooth and sampled motion. Vision Res 1986;26:643–652
14. Wertheimer M, Experimental studies on the seeing of motion. In: Shipley T, ed. Classics in Psychology. New York: Philosophical Library; 1912:1032–1089.
15. Ferry ES. Persistence of vision. Am J Sci 1892;44:192–207
16. Yanagisawa E, Yanagisawa K. Stroboscopic videolaryngoscopy: a comparison of fiberscopic and tele scopic documentation. Ann Otol Rhinol Laryngol 1993;102:255–265
17. The Society of Motion Picture and Television Engineers. SMPTE 170M-2004; Television-Composite Analog Video Signal-NTSC for Studio Applications (Revision of SMPTE 170M-1999). SMPTE; 2004
18. International Telecommunication Union. Characteristics of Composite Video Signals for Conventional Analogue Television Systems (Recommendation ITU-RBT.1 700). ITU; 2005
19. von Leden H. The electronic synchron-stroboscope: its value for the practicing laryngologist. Ann Otol Rhinol Laryngol 1961;70:881–893
20. Colton RH, Casper JK, Leonard RJ. Understanding Voice Problems: A Physiological Perspective for Diagnosis and Treatment. Baltimore, MD: Lippincott Williams &Wilkins; 2006
21. Talbot HF. Experiments on light. Philos Mag Ser 3 1834; 5:321–334
22. Plateau J. Sur un principe de photométrie. Mém Acad R Belg 1835;2:52–59
23. Keller K, Kampfer H, Matijec R, et al. Photography. In: Ullman’s Encyclopedia of Industrial Chemistry. New York, NY: Wiley-VCH; 2005
24. Porter TC. Contributions to the study of ‘flicker.’ Proc R Soc Lond 1898;63:347–356
25. Porter TC. Contributions to the study of flicker. Paper II. Proc R Soc Lond 1902;70:313–329
26. Porter TC. Contributions to the study of flicker. Paper III. Proc R Soc Lond Ser A 1912;86:495–513
27. Hecht S, Verrijp CD. Intermittent stimulation by light: III. The relation between intensity and critical fusion frequency for different retinal locations. J Gen Physiol 1933;17:251–268
28. Hecht S, Verrijp CD. The influence of intensity, color and retinal location on the fusion frequency of intermittent illumination. Proc Natl Acad Sci U S A 1933; 19:522–535
29. Hecht S, Shlaer S, Verriijp CD. Intermittent stimulation by light: II. The measurement of critical fusion frequency for the human eye. J Gen Physiol 1933; 17: 237–249
30. Anderson J, Anderson B. The myth of persistence of vision revisited. J Film Video 1993;45:3–12
31. Cranen B, de Jong F. Laryngostroboscopy. In: Kent RD, Ball MJ, eds. Voice Quality Measurement. San Diego, CA: Singular Publishing Group; 2000:257–267
32. JEDMED. StroboCAM II. St. Louis, MO: JED MED; 2008
33. Nagashima H, Tuda K, Marui M. Larynx stroboscope for photography. US Patent No. 4,232,685;1980