Welcome to Preclinical Speech Science: Anatomy, Physiology, Acoustics, and Perception, Third Edition. Two preliminaries are offered here. One is a discussion of the focus of the book, the other a discussion of the domains of preclinical speech science and preclinical hearing science.


Preclinical Speech Science: Anatomy, Physiology, Acoustics, and Perception is designed as an introduction to the fundamentals of speech and hearing science that are important to aspiring and practicing clinicians. The text is suitable for courses that cover the anatomy and physiology of speech production and swallowing, the anatomy and physiology of the hearing mechanism and auditory psychophysics, the acoustics and perception of speech, and general neuroanatomy and neurophysiology and its relevance for speech and hearing. It also includes sidetracks of clinical and historical interest, considerations of the scientific bases of clinical protocols and methodologies, and discussions of clinical personnel involved in the evaluation and management of disorders of speaking, hearing, and swallowing. This book provides up-to-date coverage of the science of speech and hearing, is user friendly to beginning students, yet integrative and translational for graduate students and practicing speech-language pathologists and audiologists. It is an outgrowth of the three authors’ many years of teaching experience with several thousand undergraduate and graduate students.

The illustrations, done by the extremely talented artist Maury Aaseng, are a key feature of this book. These original illustrations, largely in full color, are supplemented by a small number of illustrations from other sources. The original illustrations were carefully chosen and drafted to convey only salient features, an approach in line with the written text.


The domain of preclinical speech science is portrayed in Figure 1–1. This domain encompasses speech production, speech acoustics, speech perception, and swallowing. Within this domain, consideration is given to levels of observation, subsystems of speech production and swallowing, and applications of data.

Levels of Observation

Speech production and swallowing are processes. They result in acoustic products (more so for speech than swallowing) and perceptual experiences. These processes, products, and experiences involve different levels of observation. Six such levels are represented in Figure 1–1: (a) neural, (b) muscular, (c) structural, (d) aeromechanical, (e) acoustic, and (f) perceptual. These levels of observation are not completely separate entities but have important interactions. These interactions are not shown in the figure but are discussed in subsequent chapters.

The neural level of observation encompasses nervous system events during speech production and swallowing. These include all events that qualify as motor planning and execution and all forms of afferent and sensory information that influence the ongoing control of speech production and swallowing. The neural level of observation pertains to the parts of the brain, spinal cord, and cranial and spinal nerves important to speech production and swallowing and to all underlying neural mechanisms, some voluntary and some automatic, some that involve awareness, and some that do not. Neural data are often derived from physical or metabolic imaging methods that reflect patterns of activation of different regions of the brain. Activation at the neural level can also be inferred from events associated with other (downstream) levels of observation.

The muscular level of observation is concerned with the influence of muscle forces on speech production and swallowing. Muscle forces are responsible for powering these two processes. Muscles are effectors that respond to control signals from the nervous system. The muscular events of speech production and swallowing are manifested in mechanical pulls and are often indexed at the periphery through the electrical activities associated with muscle contractions. Inferences about muscle activities are also made from measurements of the forces or movements generated by different parts of the speech production apparatus and swallowing apparatus. Nevertheless, there are ambiguities introduced when attempting to infer individual muscle activities from forces or movements because forces and movements are usually accomplished by groups of muscles working together. Such inferences, if they can be made at all, require a detailed knowledge of anatomy and physiology.

The structural level of observation deals with anatomical structures and movements of the speech production apparatus and swallowing apparatus. This level of observation is concerned not only with the many muscular and non-muscular structures that make up the speech apparatus, including bone, muscle, ligaments, and membranes, but also with the displacements, velocities, and accelerations/decelerations of structures and how they are timed in relation to the movements of other structures. Certain structural observations can be made with the naked eye, whereas others are hidden from view or are too rapid to be followed with the naked eye and require the use of instrumental monitoring. To the person on the street, the structural level of observation is public evidence of speech production and swallowing. Speech reading (lip reading) has its roots at this level of observation.

The structural movements of speech production and swallowing give rise to an aeromechanical level of observation. It is at this level that air comes into play. Movements of structures impart energy to the air by compressing and decompressing it and causing it to flow from one region to another. The raw airstream generated in association with the aeromechanical level is modified by structures of the speech production apparatus and swallowing apparatus that lie along various passageways. The products of the aeromechanical level are complex, rapid, and nearly continuous changes in air pressures, airflows, and air volumes. These products are usually “invisible,” especially for swallowing. However, those who speak and smoke at the same time or who speak in subfreezing temperatures often provide the observer with the opportunity to visualize certain aeromechanical events.

The acoustic level of observation is fully within the public domain. Although certain aspects of swallowing may be accompanied by sounds, primacy at this level pertains to the generation of speech sounds. The raw material of the acoustic level is the sonorous, buzzlike, hisslike, and poplike sounds that result from the speaker’s valving of the airstream in different ways and at different locations within the speech production apparatus. This raw material is filtered and conditioned by its passage through the apparatus and radiates from the mouth or nose, or both, in the form of very fast and nearly continuous air pressure changes experienced as sound waves. These sound waves propagate from the speaker’s mouth and can be coded in terms of frequency, sound pressure level, and time and are what constitute speech, the acoustic representation of spoken language. The acoustic level is important in face-to-face communication and in the use of telephones, radios, televisions, hearing aids and cochlear implants, and various forms of recording. It is this level that makes it possible to communicate effectively around corners, through obstacles, in the dark, and over long distances.

The perceptual level of observation has somewhat different manifestations for speech and swallowing. For speech, auditory analysis of the speech (acoustic) signal allows the listener to recognize phonetic cues that are consistent with the listener’s knowledge of the sound system of a language. The speaker is also a perceiver of her own speech acoustic signal, using it to check that the signal she intended is the one she produced. Visual information is another source of information for the perception of speech. Listeners, even those with normal hearing, are known to combine acoustic and visual information for the most effective perception of speech. In contrast, swallowing relies less on auditory and visual information, but is highly dependent on the more subconscious experiences of kinesthesia and proprioception (awareness of position and movement characteristics of body structures, such as the tongue and jaw). Swallowing is also guided by touch and pressure sensations (as in awareness of contact of the tongue with the hard palate), which originate in sensory receptors embedded in the skin and muscles. Taste, which is detected by specialized taste receptors on the tongue and other oral structures, and consistency of food, which is detected by tactile receptors in the pharyngeal-oral component of the speech apparatus, can also serve as perceptual information for swallowing. Of course, cognitive processes contribute to the perceptual level of observation for both speech and swallowing. Cognitive processes in speaking, swallowing and hearing are not treated in detail in this text.

Subsystems of Speech Production and Swallowing

The activities of speech production and swallowing share many of the same structural and functional components. These components can be divided, somewhat arbitrarily, into subsystems. Speech production subsystems may differ when chosen by a linguist versus a speech scientist versus a speech-language pathologist; and swallowing subsystems may differ when chosen by a swallowing scientist versus a gastroenterologist versus a speech-language pathologist. For the purposes of this book, four subsystems are used for speech production and swallowing. As illustrated in Figure 1–1, these include the (a) breathing apparatus, (b) laryngeal apparatus, (c) velopharyngeal-nasal apparatus, and (d) pharyngeal-oral apparatus. The functional significance of each of the four subsystems differs between speech production and swallowing, but each subsystem is critically important to its respective behaviors and each manifests clinical signs that can reveal abnormality.

The breathing apparatus is defined in the present context to include structures below the larynx within the neck and torso. These are, most importantly, the pulmonary apparatus (pulmonary airways and lungs) and chest wall apparatus (rib cage wall, diaphragm, abdominal wall, and abdominal content). During speech production, the breathing apparatus provides the necessary driving forces while simultaneously serving the functions of ventilation and gas exchange. During swallowing, the breathing apparatus engages in a period of apnea (breath holding) to protect the pulmonary airways and lungs from the intrusion of unwanted substances (food and liquid). The breathing apparatus is the largest of the subsystems and its role in speech production and swallowing is fundamentally important.

The laryngeal apparatus lies between the trachea (windpipe) and the pharynx (throat) and adjusts the coupling between the two. At times, the laryngeal airway is open to allow air to move in and out of the breathing apparatus, whereas at times it is adjusted to obstruct or constrict the airway. Very rapid to and fro movements of the vocal folds within the larynx create voiced sounds and give the laryngeal apparatus its colloquial label “voice box.” The larynx can also produce noisy sounds, like whisper. During swallowing, the laryngeal apparatus is active in closing the laryngeal airway to protect the pulmonary airways. Food and liquid are then able to pass over and around the larynx and into the esophagus on their way to the stomach.

The velopharyngeal-nasal apparatus consists of the upper pharynx, velum, nasal cavities, and outer nose. It is important to include the nasal portion of this subsystem because it can have a significant influence on the aeromechanical and acoustic levels of the speech production process. When breathing through the nose, the velopharyngeal-nasal airway is open. When speaking, the size of the velopharyngeal port varies, depending on the nature of the speech produced. For example, consonant sounds that require high oral air pressure are typically associated with airtight closure of the velopharyngeal port, whereas nasal consonants are produced with an open velopharyngeal port. Function of the velopharyngeal-nasal apparatus during swallowing is concerned mainly with keeping the velopharynx sealed airtight. This prevents the passage of food and liquid into the nasal cavities while substances are moved backward and downward through the oropharynx.

The pharyngeal-oral apparatus comprises the middle and lower pharynx, oral cavity, and oral vestibule. During running speech production, the apparatus is typically open during inspiration and makes different adjustments for consonant and vowel productions during expiration, including the generation of transient, voiceless, and voiced sounds and the filtering of those sounds. During swallowing, the pharyngeal-oral apparatus prepares food and liquid and propels it to the esophagus.

Applications of Data

There are many applications of data obtained about speech production and swallowing. These applications depend on who selects and defines the data and what the goals are for collecting and analyzing them. Figure 1–1 shows four important applications of data: (a) understanding mechanism, (b) evaluation, (c) management, and (d) forensics.

One application of data is the understanding of mechanism. This use provides the foundational bases for knowing how speech is produced and how swallowing is performed. Such foundational bases are important for their heuristic value in elucidating fundamental processes and principles and for differentiating normal from abnormal.

Another application of data is their use in evaluation. This use is usually practical in nature and involves quantitative determinations of the status and functional capabilities of an individual’s speech production, speech, and swallowing. Evaluation first enables a determination as to whether or not abnormality exists. If abnormality does exist, then appropriate evaluation may contribute to: (a) making a diagnosis, (b) developing a rational, effective, and efficient management plan, (c) monitoring progress during the course of management, and (d) providing a reasonable prognosis as to the extent and speed of improvement to be expected. For example, a specific use of subsystems analysis in the evaluation of speech production is the determination of how individual subsystems contribute to deficits in speech intelligibility. Two individuals may have equivalent intelligibility problems as determined by formal tests but have different subsystems “explanations” for their deficits. The careful evaluation of subsystems performance can point to which parts of the speech production apparatus may be most responsible for speech intelligibility deficits and how those parts should be addressed in management. The subsystems approach to evaluation cannot be applied effectively without solid knowledge of normal structures and functions, as described in this text.

A third application of data is management. Different management strategies may be based on any of the six levels of observation and include any of the four subsystems of speech production and swallowing. Strategies may include adjusting individual variables or combinations of variables, staging the order of different interventions, and providing feedback about speech production and swallowing processes, products, and experiences. Management data provide information about outcomes and whether or not interventions are effective, efficient, and long lasting. Management data can also be used to compare and contrast different interventions to arrive at optimal choices.

The remaining application of data is their use in forensics. This application is concerned with scientific facts and expert opinion as they relate to legal issues. The speech scientist and speech-language pathologist are sometimes called on to give legal depositions or to testify in courts of law in a variety of forensic contexts. Forensic uses of data may include issues pertaining to speaker identification, speaker status under the influence of drugs or alcohol, and speaker intent at deceit, among others. Forensic uses of data may also relate to personal injury claims or malpractice claims. These may involve speech production, speech, or swallowing alone, or in different combinations, and may include adversarial depositions and testimonies of other experts. Under such circumstances, the status and capabilities of the individuals claiming personal injury or malpractice may be considered from the perspective of underlying mechanism, evaluation, and management.


The domain of preclinical hearing science is portrayed in Figure 1–2. This domain encompasses audition, which serves the purpose of hearing and recognizing environmental sounds, music, speech acoustic signals, and electronically transmitted signals (as in the case of hearing aids and cochlear implants). Like the domain of preclinical speech science, consideration is given to levels of observation, subsystems, and applications of data.

Levels of Observation

Figure 1–2 shows levels of observation for audition. They include: (a) acoustic (pressure waves), (b) aeromechanical, (c) structural, (d) muscular, (e) mechanosensory, and (f) neural. This is consistent with the idea of speech production as the output and audition as the input of the speech communication process.1

Stay updated, free articles. Join our Telegram channel

Aug 28, 2021 | Posted by in OTOLARYNGOLOGY | Comments Off on Introduction

Full access? Get Clinical Tree

Get Clinical Tree app for offline access