Speech Physiology Measurement and Analysis


Speech Physiology Measurement and Analysis


Much of the knowledge about speech production has been acquired through measurement. This chapter is dedicated to measurement and analysis of speech production physiology as it relates to the function of the breathing apparatus, laryngeal apparatus, velopharyngeal-nasal apparatus, and pharyngeal-oral apparatus. Acoustic measurement and analysis of speech are covered in Chapter 10. This chapter concludes with a section on health care professionals who may use speech physiology measurements in their provision of clinical services.


Three types of measurements are the focus of this section. They are spirometry (to obtain measures of lung volume and airflow), chest wall surface tracking (to obtain measures of lung volume, airflow, and chest wall shape), and manometry (to obtain measures of pressure). Some of these measurements are made during speech production, but some are not. Those that are included here are commonly encountered in clinical settings and provide information that may be relevant to understanding the nature of a speech breathing disorder.


Spirometry can be used to measure lung volume and airflow. Recall that lung volume is the volume (size) of the air inside the pulmonary apparatus (lungs and lower airways). Lung volume is important for speech production because it has implications for how much speech can be produced on a single breath. Airflow in this context is lung volume change over time and it, too, has implications for how much speech can be produced on a breath.

A wet (or water) spirometer, shown in Figure 6–1, is an old-fashioned instrument that can be used to measure lung volume and airflow. Invented in the 1800s, the wet spirometer includes a chamber containing water and a bell that floats inside the chamber. Air moving into the bell causes it to rise and air moving out of the bell causes it to fall, the height of the bell being directly proportional to the volume of air in the spirometer. A mouthpiece can be used to couple the person to the spirometer (as shown in Figure 6–1), as long as the nose is occluded and the lips are sealed around the mouthpiece; a facemask that covers the mouth and nose is another coupling option. A pen fixed to the bell may provide a record of volume change on paper attached to a drum, or a potentiometer driven by movement of the bell may provide an electrical signal for display on a screen. As long as the spirometer has a drum that rotates at a known speed, the tracing can also be used to calculate airflow (change in volume over time). This tracing is called a spirogram (see also Figure 2–16). Wet spirometers are instructive because their operation is easy to understand. However, they are not in wide used today.

Most of today’s spirometers incorporate a pneumotachometer, a device that measures instantaneous airflow (pneumo = relating to the lungs or air; tachometer = instrument that measures speed). A pneumotachometer, such as that illustrated in Figure 6–2, senses airflow in both directions by recording the air pressure difference across a resistive screen (a wire mesh screen, much like a window screen). When air flows across the screen, there is a pressure drop from one side (the side of the screen closest to the airflow source) to the other side. The higher the airflow, the greater the pressure drop across the screen. This pressure drop is sensed by a differential air pressure transducer (a transducer that subtracts one pressure from another) and provides an electrical signal proportional to the airflow. This airflow signal can be “added up” over time by a device called an integrator. The integrated (summed) signal provides a measure of volume. An analogy to help understand the difference between measurements of airflow and volume is that an airflow measurement is like reading an automobile’s speedometer and a volume measurement (by airflow integration) is like reading the automobile’s odometer.

Spirometers can be used to measure most subdivisions of the lung volume, as described in Chapter 2 and illustrated in Figure 2–16. The exceptions are the residual volume and any lung capacities that include the residual volume (functional residual capacity and total lung capacity). This is because, by definition, residual volume cannot be expired voluntarily. (The residual volume can be estimated using indirect measurement approaches, but these are beyond the scope of this book.) One measure not mentioned in Chapter 2, but one often encountered in clinical settings, is the forced vital capacity (FVC). This is like a vital capacity (VC), except that it is obtained by expiring as quickly and as fully as possible following a full inspiration. Thus, an FVC differs from a VC only by the speed with which the maneuver is executed. In healthy people, the VC and FVC are comparable. However, the FVC may be much smaller than the VC in someone who has compliant (floppy) airways that collapse in the face of high airflow (thereby trapping much of the air so that it cannot be expired).

Spirometers can also be used to measure ventilation. Recall that ventilation is defined as the movement of air into or out of the pulmonary apparatus and is usually quantified over a given period, such as a minute. In fact, the term minute ventilation is commonly used in clinical practice. Minute ventilation is calculated by adding the lung volume inspired (or expired) over the course of a minute.

Sometimes it is of interest to examine the relationship between volume and airflow by generating what are called flow-volume loops. In this case, the pneumotachometer form of spirometer is used to provide direct measures of airflow. A flow-volume loop, such as might be generated by a healthy person, is shown in Figure 6–3. To generate a flow-volume loop, the person is instructed to first breathe out all the air possible, seal the lips around the spirometer mouthpiece, and inspire as quickly and as fully as possible (from residual volume) and then expire as forcefully and as fully as possible (back to residual volume). The trajectory shown in Figure 6–3 indicates that inspiratory flow increases and then decreases during the inspiration phase of the maneuver and that the peak (fastest) expiratory flow occurs at the very beginning of expiration and tapers off from there. The magnitude and shape of the flow-volume loop can be interpreted in ways that help detect and characterize certain disease states.

Spirometers (either wet spirometers or those with pneumotachometers) can be used to measure lung volume and lung volume change (airflow) during speech production; however, the coupling part of the measurement system can be a limiting factor. If a mouthpiece is used (such as that shown in Figure 6–1), measures can only be obtained during productions such as sustained vowels. In contrast, when using a facemask that covers both the mouth and nose, it is possible to produce a wide variety of speech sounds and sound combinations. It is critical that the facemask be sealed airtight around the nose, cheeks, and chin and that the pneumotachometer be sensitive enough to pick up the very high airflows that are produced during running speech production (e.g., during the release of a voiceless stop-plosive). It is also important to recognize that the presence of a facemask may restrict upper airway articulatory movements during speech production.

Chest Wall Surface Tracking

Lung volume and lung volume change (airflow) can be measured in ways other than using a spirometer, one of which involves the use of instrumentation that tracks movement of the chest wall surface. Chest wall surface tracking also provides a way to measure chest wall shape.

Two chest wall surface tracking instruments, respiratory magnetometers and respiratory inductance plethysmographs, are shown in Figure 6–4. Respiratory magnetometers operate on the principle that the rib cage wall and abdominal wall each displace volume as they move and each usually behaves with a single degree of freedom with respect to their movement (meaning that all points on their surface move together). The magnetometers include pairs of electromagnetic coils consisting of generator and sensor (front and back) mates that are used to transduce anteroposterior diameter changes of the rib cage wall and abdominal wall. As the coils move away from each other (as they might during inspiration), the amplitude of the signal “seen” by the sensor coil decreases; as the coils move toward each other (as they might during expiration), the signal amplitude increases. Respiratory inductance plethysmographs operate in a similar manner, except that they include broad elastic bands with embedded electrical wires that sense the average cross-sectional areas of the rib cage wall and abdominal wall. As the bands expand and contract (as they might during inspiration and expiration) the signal outputs change accordingly.

Lung volume and lung volume change can be measured using either respiratory magnetometers or respiratory inductance plethysmographs. To do so, one need merely sum and calibrate the output signals from the electromagnetic coils or the two elastic sensing bands against a known volume (as measured by a spirometer) to obtain a measure of lung volume change from movements of the body surface. The lung volume tracings from summed rib cage and abdominal wall signals look like the tracings found in a spirogram (such as those shown in Figures 2–16 and 6–1).

Chest wall surface tracking may be the best way to measure lung volume, lung volume change, and ventilation during speech production because the speaker’s face is unencumbered by a facemask and speech production can proceed relatively naturally. In addition, if the output signals are properly calibrated and monitored, it is possible to determine not just how much volume is being expired, but also exactly where within the vital capacity speech is being produced. As an example, it would be possible to determine that a person was speaking primarily within the inspiratory capacity with occasional encroachments into the expiratory reserve volume during the production of long breath groups.

Chest wall surface tracking has the additional advantage of allowing for visualization of chest wall shape and changes in shape. This is possible when the rib cage wall signal is displayed against the abdominal wall signal in an xy plot (such as that shown in Figure 2–20). This form of display is very powerful because it allows the examiner to make inferences about potential muscular mechanisms underlying the movements of the chest wall (see Figure 2–22). Chest wall surface tracking has been used to investigate speech breathing in healthy individuals (e.g., Hixon, Goldman, & Mead, 1973) and individuals with speech breathing disorders (e.g., Hoit, Banzett, Brown, & Loring, 1990; Solomon & Hixon, 1993) as well as the effects of behavioral interventions on speech breathing behavior (e.g., Darling-White & Huber, 2017).

Where’s the Border?

Borders may or may not be important. Ride your Harley-Davidson motorcycle around the monument at the Four Corners junction from Arizona through New Mexico through Colorado to Utah (for the geographically challenged, this is a continuous left turn) without wearing a helmet and there will be times you’re breaking the law and times you’re not. You need to know each state’s law and identify each border to know your status. But the border between the rib cage wall and the abdominal wall is another story. When the two structures move, it’s hard to identify where their two edges meet. The respiratory magnetometers discussed in the text make it so that you don’t have to fret such delineation. Magnetometers have the advantage that their coils can be placed at the center of the surfaces being monitored and far away from their edges. Where’s the border? Who cares? It’s not important to know when using respiratory magnetometers.


A pressure-measuring device is called a manometer. Manometers come in many forms and are designed to measure many different types of pressure. The pressure of interest in the present context is alveolar pressure (the pressure inside the alveolar air sacs), a pressure of great relevance to speech production. Alveolar pressure cannot be measured directly in humans; however, it can be estimated from oral pressure under specified conditions. These conditions include that the (a) velopharynx is closed and that the lips are sealed airtight, as any air leak will give a falsely low estimate of alveolar pressure, (b) flow of air between the alveoli and the oral cavity is zero (or nearly zero), and (c) pressure measurement is not influenced by impounding pressure in the oral cavity by adjustments of the cheeks and other oral structures. Three types of manometers that have been used to measure oral pressure are discussed here: U-tube manometers, air-gauge manometers, and pressure transducers (mechanical-electrical manometers).

A U-tube manometer is depicted in Figure 6–5. The manometer includes a U-shaped tube containing water, a connecting tube coupled to one arm of the tube, and a calibration scale (in centimeters). To measure pressure, one need only blow into the connecting tube and the water will be displaced by a distance that corresponds to the pressure exerted. In the figure, the manometer shows a 30-centimeter displacement; therefore, the pressure exerted is 30 centimeters of water (cmH2O). U-tube manometers are instructive for understanding the concept of pressure as expressed in “centimeters of water.”

Air-gauge manometers, such as the one shown in Figure 6–6, are small and easy to use and are often found in clinical settings. The manometer in the figure has a needle that moves clockwise around the face of the gauge to indicate the pressure being measured. This manometer could be used to measure maximum expiratory pressure (by having the client blow into the tube as forcefully as possible) and maximum inspiratory pressure (by having the client suck from the tube as forcefully as possible). Of course, present-day manometers are often digital rather than analog, like the one shown in the figure.

Pressure transducers can be thought of as mechanical-electrical manometers. The transducer depicted in Figure 6–7 has a metal diaphragm housed within it which deforms when a pressure is applied to one side of it. This deformation is converted into an electrical signal, the amplitude of which varies with the degree of deformation. When the electrical signal is calibrated against a known pressure (by using a U-tube manometer, for example), the transducer provides an accurate measure of the pressure differential across the diaphragm. Of the three types of manometer described here, pressure transducers are by far the best for obtaining measures of oral pressure (as an estimate of alveolar pressure) during speech production.

Measuring oral pressure to estimate alveolar pressure during speech production is somewhat more complicated than measuring oral pressure during non-speech activities such as blowing. However, it is possible to do when using a pressure transducer system and a particular type of speech sample (Hertegård, Gauffin, & Lindestad, 1995; Netsell & Hixon, 1978). As illustrated in Figure 6–7, a small polyethylene pressure tube is connected to a pressure transducer and the other end of the tube is placed at one corner of the mouth just behind the front teeth so that its end is perpendicular to the flow of air out of the mouth.

The key is to capitalize on a period during speech production when oral pressure and alveolar pressure are essentially equal. This occurs during the closed phase of a voiceless stop-plosive sound (e.g., /p/) when the oral and velopharyngeal valves are sealed airtight and the laryngeal valve is open. At the moment of peak pressure, the pressure recorded in the oral cavity is essentially equal to the pressure in the alveoli. When an utterance sequence includes interspersed voiceless stop-plosives (such as /pipipipipipipi/), the peak oral pressures measured during the consonants can be interconnected to reveal the underlying alveolar pressure contour. This is done by constructing a contour from successive linear interpolations between the peak pressures of adjacent consonants, as illustrated in the lower panel of Figure 6–7.


Laryngeal physiology can be measured and analyzed using a variety of approaches. This section considers three of them: endoscopy, electroglottography, and aeromechanical observations.


Visualization of the larynx is one of the most important tools available for determining its status and quantifying its actions. Such visualization entails inserting a viewing device through either the oral or nasal cavities. This method, called endoscopy, includes some form of illumination and some form of optical device that gathers the laryngeal image.

As illustrated in the upper panel of Figure 6–8, visualization via the oral route is most often done with a device called a rigid endoscope that is positioned along the upper surface of the tongue (with the tongue tip pulled forward and out of the way) and into the oropharynx. The image can be viewed through an eyepiece or recorded by one of several different optical recording systems. The rigid endoscope provides an excellent image of the larynx (examples of which are included in Figures 3–26 and 3–27). Nevertheless, it has limitations, one of which is that the client might find the positioning of the device to be uncomfortable and another is that its positioning interferes with movements of pharyngeal-oral structures so that only vowel-like productions can be examined.

As illustrated in the lower panel of Figure 6–8, visualization via the nasal route is accomplished by inserting a flexible endoscope through one side of the nose over the upper surface of the velum and into the pharynx. The flexible endoscope has positional controls so that its distal tip can be oriented to obtain an unobstructed view of the vocal folds. The image can be viewed through an eyepiece or recorded on an optical recording system. A major advantage of the flexible endoscope is that it does not encumber pharyngeal-oral structures so that the behavior of the larynx can be examined during a wide range of speech production activities. Nevertheless, its image is of lower quality than that obtained with the larger rigid endoscope.

Endoscopes that use a constant light source are well suited for imaging stationary structures or structures that move relatively slowly. However, the rapid movements of vocal fold vibration cannot be resolved with the naked eye, no matter how good the light source. One way to “slow” these movements is by the optical illusion created with a flashing-light stroboscope. Brief flashes of light illuminate the vocal folds, with each flash being advanced slightly in time with each vibratory cycle such that the phase difference between the vocal fold cycle and the flash cycle progressively increases (Baken & Orlikoff, 2000). This creates the illusion of a slowly moving vocal fold vibration that is actually a composite of the sampling of many successive cycles (Hirano & Bless, 1993). The terms videoendoscopy and videostoboscopy are sometimes used to differentiate the recording of laryngeal images using a constant light source versus using a stroboscopic light source (Deliyski, Hillman, & Mehta, 2015).

A more recent application of endoscopy is high-speed videoendoscopy, which can record an almost limitless number of images per vocal fold vibratory cycle. A typical rate of 2000 to 8000 frames per second can usually capture at least 20 images per cycle. This means that a fundamental frequency of 200 Hz (with a period—the time between each vocal fold vibration—of 0.005 second or 5 milliseconds) sampled at 2000 frames per second yields an image every 2.5 microseconds. This allows for a detailed analysis of each cycle and offers a more complete picture of vocal fold vibration than does the use of stroboscopy, particularly for clients with voice disorders (Patel, Dailey, & Bless, 2008).

Methods have been developed to quantify the images obtained through the use of videoendoscopy and videostoboscopy and protocols have been proposed to interpret those quantified images in ways that are relevant to clinical practice (Colton & Casper, 1996; Hirano & Bless, 1993; Kendall & Leonard, 2010; Poburka, Patel, & Bless, 2017). Quantification is usually in the form of ratings of structural integrity, non-vibratory movements, and vocal fold vibration (Patel et al., in press). For example, ratings may be obtained for the appearance of the vocal fold edges, extent of vocal fold abduction and adduction, regularity and amplitude of vocal fold vibration, and glottal shape. High-speed videoendoscopy images can also be quantified for clinical applications by using a rating form (Poburka et al., 2017; Yamauchi et al., 2012). However, high-speed videoendoscopy is more frequently used in research applications, with many different automated analyses available to describe specific aspects of vibration. One example involves converting the images into kymographs, tracings that represent movements of a single point at the same transverse level on each of the two vocal folds (Svec & Schutte, 1996).


Electroglottography is a noninvasive method for estimating the area of contact between the vocal folds (Fourcin, 1974) that capitalizes on the electrical conduction properties of laryngeal tissues. As shown in Figure 6–9, use of an electroglottograph involves the placement of electrodes on both sides of the neck, positioned over the left and right alae of the thyroid cartilage. A weak high-frequency electrical current flows between these two electrodes and a determination is made as to the impedance (opposition) offered to that current flow by laryngeal structures.

Tissues in the vocal folds are good electrical conductors, whereas the air between the vocal folds (when a glottis exists) is an extremely poor electrical conductor. Therefore, the electrical impedance across the larynx rises when the laryngeal airway opens and falls when the vocal folds come into increasingly more extensive contact (Baken & Orlikoff, 2000). Changes in impedance as measured through the use of electroglottography can reflect both slow laryngeal adjustments, such as those associated with abduction and adduction of the vocal folds, and rapid changes in vocal fold contact area, such as those associated with vocal fold vibration (Childers, Hicks, Moore, Eskenazi, & Lalwani, 1990; Childers, Smith, & Moore, 1984; Titze, 1990).

The lower part of Figure 6–9 contains an electroglottographic tracing representing one vocal fold vibratory cycle. This tracing is interpreted to represent the time course of changes in vocal fold contact area over the cycle (Childers & Krishnamurthy, 1985; Colton & Conture, 1990; Lecluse, Brocaar, & Verschuure, 1975; Orlikoff, 1998). Also shown in the figure is a typical analysis procedure applied to electroglottographic data in which the portion of the cycle during which the vocal folds are considered to be approximated (labeled “closed” in the figure) and the portion during which they are not (labeled “open” in the figure) are demarked and measured.

There are other, less commonly used applications of electroglottography. One such application involves the interpretation of certain perturbations in the electroglottographic signal to identify the onset and offset of vocal fold abduction and adduction during speech production (Rothenberg, 2009; Rothenberg & Mahshie, 1988). Another application is to track the vertical position of the larynx within the neck (Rothenberg, 1992), such as during voice therapy (Wistbacka, Sundberg, & Simberg, 2016). This application requires two pairs of electrodes, one pair placed near the lower part of the thyroid cartilage and the other placed near the upper part of the cartilage. Vertical movement of the larynx is reflected in changes in the relative amplitudes of the signals generated by the upper versus lower electrode pairs.

Electroglottography has come to be used by a variety of disciplines concerned with understanding laryngeal behavior during voice production and other activities. The popularity of this method is based on its simplicity and noninvasive nature and because it can be used as both an evaluation tool and a feedback device for management (Baken & Orlikoff, 2000; Kitzing, 1990; Motta, Cesari, Iengo, & Motta; 1990; Smith & Childers, 1983). It can also be used in conjunction with other measurement methods discussed in this section because it does not interfere with the way those methods operate. Although electroglottography is easy to use, there are limitations on what physiological inferences can be made from its signal. For example, discrepancies have been found between the period of presumed vocal fold approximation during the vibratory cycle as determined by electroglottography and the period of approximation as determined from simultaneous endoscopic high-speed videokymographic tracings (e.g., Herbst, Schutte, Bowling, & Svec, 2017).

Aeromechanical Observations

Aeromechanical observations can provide information about the status of the laryngeal airway. Three commonly used aeromechanical observations are airflow through the larynx, calculation of resistance to airflow provided by the larynx, and determination of phonation threshold pressure.

Airflow through the larynx is usually measured with a pneumotachometer, such as that shown in Figure 6–2, located near the airway opening. Under certain conditions, such as during vowel production, the average (mass) airflow through the larynx (translaryngeal airflow) can be estimated at the airway opening because airflow through the larynx is continuous with that at the airway opening. Thus, this average airflow is an indicator of the average openness of the larynx (providing driving pressure does not change) and the extent to which it allows air to pass between the trachea and pharynx (Isshiki & von Leden, 1964). When measuring average airflow, the pneumotachometer system is usually filtered so that only slow (low frequency) airflow events are recorded.

During voice production, the flow of air through the larynx has superimposed on it a rapidly varying component associated with vocal fold vibration. This high-frequency component of the airflow is of interest because of what it reveals about the nature of vocal fold function in generating the sound source for speech (Isshiki, 1985). A common way to acquire a measurement of this cycle-to-cycle airflow waveform is to record airflow changes at the airway opening with a specially designed facemask in which pneumotachometer screens are actually built into its walls. This mask, called a circumferentially vented mask and shown in Figure 6–10, greatly improves the sensitivity of the airflow measuring device and enables the faithful recording of the extremely rapid airflow changes associated with vocal fold vibration. This high-frequency airflow signal, when recorded at the airway opening, is not a perfect reflection of the airflow signal generated at the larynx because it has been modified by acoustic effects of the vocal tract (pharyngeal-oral airway). Thus, the airflow signal is subjected to what is called inverse filtering, a filtering algorithm that is designed to minimize the effect of the acoustic properties of the downstream airway on the final airflow waveform (Rothenberg, 1973; also see Chapter 8, Figure 8–3).

Figure 6–11 presents a comparison of average airflow and cycle-to-cycle airflow. Low-pass filtered (average) airflow (red tracings) are superimposed on tracings of fast cycle-to-cycle airflow events associated with voice production (black tracings). The top set of tracings show the full 0.2 second of vocal fold vibration and the bottom set of tracings are zoomed-in images of the last few cycles. The highest peaks in the airflow tracings are responsible for creating the abrupt air pressure changes that become sound.

Airflow through the larynx is influenced not only by the openness of the larynx, but also by the forcefulness with which air is being driven through the airway (D’Antonio, Netsell, & Lotz, 1988). Thus, the best indicator of the general status of the laryngeal airway itself is an estimate of the resistance offered by the larynx to the flow of air through it (van den Berg, Zantema, & Doornenbal, 1957). This requires knowledge of both the air pressure driving the airflow and the resultant airflow.

The most commonly used clinical method for estimating laryngeal airway resistance during voice production is the method of Smitheran and Hixon (1981). This method records oral air pressure and airway-opening airflow to calculate laryngeal airway resistance. As shown in Figure 6–12, measurements are taken at moments that enable estimates to be made of the air pressure difference across the larynx and the airflow through it during vowel productions. Resistance is calculated by dividing the air pressure difference (estimated tracheal air pressure minus estimated pharyngeal air pressure) by the translaryngeal airflow (estimated from the airflow at the airway opening). Resistance values are typically expressed in cmH2O/LPS (centimeters of water/liters per second) and can range from very low (wide open airway) to infinite (airtight closure of the airway). Such resistance values reflect the degree of opening of the laryngeal airway during voice production (Holmberg, Hillman, & Perkell, 1988, 1999; Leeper & Graves, 1984; Smitheran & Hixon, 1981).

Phonation threshold pressure is another aeromechanical measure that can provide information about laryngeal function, or more specifically, vocal fold function. Phonation threshold pressure is defined as the minimum tracheal pressure required to initiate vocal fold vibration and is understood to reflect the status of the vocal folds (viscosity and thickness) and their distance from one another (glottal width) (Titze, 1988). Although there are invasive ways to measure phonation threshold pressure, the most common way to estimate it is by using the noninvasive approach depicted in Figure 6–7, with the client producing the /p/-vowel syllable strings in the quietest voice possible (Verdolini-Marston, Titze, & Druker, 1990). The lower the peak oral pressures during /p/ productions (estimated tracheal pressure), while still maintaining voicing during the vowel segments, the lower the phonation threshold pressure. And the lower the phonation threshold pressure, the healthier vocal fold function is judged to be. Although this measure is relatively easy to obtain, it is not without its limitations. For example, it is common for the velopharynx to open during soft speech production, which results in a lowering of oral pressure, thereby making it a poor estimate of tracheal pressure (Fisher & Swank, 1997). A useful review of phonation threshold pressure and its potential for clinical application is provided by Plexico, Sandage, and Faver (2012).

Stay updated, free articles. Join our Telegram channel

Aug 28, 2021 | Posted by in OTOLARYNGOLOGY | Comments Off on Speech Physiology Measurement and Analysis

Full access? Get Clinical Tree

Get Clinical Tree app for offline access