The Voice Diagnostic: Initial Considerations, Case History, and Perceptual Evaluation

Organization of the Clinical Voice Evaluation

In the following four chapters, we will describe the basic process for assessment and diagnosis of voice as carried out by the speech–language pathologist/voice therapist/vocologist. We refer to the assessment process as the voice diagnostic protocol, which comprises three primary areas: (1) preliminary information and case history, (2) perceptual analysis of voice, and (3) instrumental analysis of vocal function. 1 The processes and analyses described in this current chapter will focus on relatively low cost methods associated with the collection, analysis, and interpretation of preliminary information, case history, and auditory-perceptual analysis of voice. In clinical practice, these procedures will be augmented by acoustic methods used to obtain measurements related to characteristics such as the pitch, loudness, and quality of voice, aerodynamic methods such as the direct measurement of pressure and flow, and laryngeal visualization via endoscopy. Acoustic, aerodynamic, and endoscopic methods are described in following chapters.

A complete voice diagnostic protocol will lead to accurate diagnosis and subsequent treatment planning when it is composed of multidimensional assessments. The clinician may question why we need multiple procedures such as acoustic analyses or laryngeal visualization if we have clearly perceived a characteristic such as breathiness in the voice during a perceptual assessment. Alternatively, if the physician or clinician has visualized the vocal folds, why do we need any other analyses? In answer to these questions, the clinician must understand that structure does not always dictate function or vice versa. It is very possible that a patient may have an essentially normal looking larynx and associated structures and yet have a very severe dysphonia (e.g., spasmodic dysphonia). In turn, it is also possible for a patient to have a significant structural or physiological change to the laryngeal structures and still have a relatively functional voice (e.g., patient who has a unilateral vocal fold paralysis in which the vocal fold is fixated at the midline of the glottis). In addition, it must always be remembered that a vast number of different voice disorders can have very similar perceptual, acoustic, and aerodynamic characteristics—perceptual and indirect acoustic/aerodynamic tasks alone would not be able to clearly define and describe the underlying condition responsible for the change in voice. Therefore, a complete profile of the voice is generated from a voice diagnostic protocol that incorporates at least four distinct areas, as illustrated in ▶ Fig. 3.1: (1) collection of preliminary and case history information; (2) perceptual analyses, including both the auditory-perceptual judgement of the clinician and the self-perception of the patient; (3) indirect measures of the acoustic and aerodynamic characteristics of voice; and (4) direct visualization of the larynx and phonatory structures. Direct evaluation indicates that the evaluation involves seeing the structure or actual activity (e.g., visualization of apparent vocal fold vibration during a laryngeal stroboscopic exam) or measurement of some characteristic of an activity in very close proximity to its source (e.g., measurement of airflow via a pneumotachograph). Indirect evaluation involves the analysis of some by-product of an activity (e.g., evaluation of the perceptual or acoustic characteristics of the voice signal produced by vocal fold vibration; estimation of airflow using a patient’s vital capacity and maximum phonation time data).

Diagram of the interrelationship between case history, perceptual evaluation, indirect measures of voice, and laryngeal visualization used in the voice diagnostic protocol. The result of the protocol

Fig. 3.1 Diagram of the interrelationship between case history, perceptual evaluation, indirect measures of voice, and laryngeal visualization used in the voice diagnostic protocol. The result of the protocol will be a complete and comprehensive profile of the patient’s vocal function.

3.3 Assessment versus Evaluation versus Diagnosis

An assessment may be described as a general process of gathering data to evaluate an examinee. The clinician will take the information from various sources including case history interview, test data, and measurement measures, and pull it all together into a cohesive whole. The assessment is a systematic method of obtaining information from tests and other sources, used to draw inferences about our patient. 2 In contrast, the various tests and procedures that are included in the assessment are our methods of evaluation, with our tests being evaluative procedures in which a sample of an examinee’s behavior may be obtained, evaluated, measured, and scored using a standardized process. 2 Finally, once our assessment (composed of all of the evaluative procedures or tests) has been completed, we will finally make a statement or conclusion about the testing and other information gathering that was part of the overall assessment process—this is our diagnosis. Whether the assessment includes tests that ask a specific question or obtain a focused measurable characteristic of the patient’s voice, the diagnostic process involves gathering information and modifying the probability of a diagnosis in the light of that information. 3 “Diagnosis requires placing measurements and other observational data into context and perspective in order to decide whether a problem exists and to differentiate one problem from others which may have similar performance aspects.” 4

The term diagnostic has been specifically used to indicate that the outcome of this process will be based on a synthesis and application of information from diverse areas all dealing with aspects of voice function, such as anatomy and physiology, acoustics, perception, psychometrics, and knowledge of norms and testing techniques. 5 In particular, the final diagnosis should also be differential in nature. Differential diagnosis takes into account all significant variables contributing to the disorder and attempts to differentiate the presenting problem from related or dissimilar problems. 6 As such, the voice diagnostic protocol is the modus operandi by which information gathered from case history, perceptual, acoustic, aerodynamic, and endoscopic information is synthesized to produce a differential diagnosis.

3.3.1 Considerations Before Entering into the Voice Diagnostic

There are a number of basic but essential parameters that must be defined and understood by the voice clinician before entering into the diagnostic. First, what do we mean by “typical/normal” versus “disordered” voice? Fex states that “Normal voice quality is a conception based on subjective opinion, may vary with different cultures, and certainly is difficult to define; a vast number of people are supposed to have normal but nevertheless individually differentiated voice” (p. 155). 7 Defining “normal” or “typical” voice is difficult since there are a number of characteristics (age, sex, racial type, body size/type, etc.) that influence normal voice type and quality. Through our life experience, we are exposed to the vast range of normal/typical voices and, thereby, develop an internal gauge by which normal/typical is judged. With this in mind, we may state that a voice is normal/typical if it does not deviate substantially from our internal gauge of parameters such as pitch, loudness, quality, and duration. 1 This definition predicates that the clinician has a good understanding of normal/typical variations in voice associated with the effects of the aforementioned parameters of age, sex, racial type, etc.

Second, when a voice is perceived as deviating from normal/typical expectations, it may be characterized as being disordered or dysphonic. The term dysphonia literally means “abnormal/difficult/impaired voice.” Of course, there is a wide variation in types of dysphonia; so, we must consider what the primary voice disorder types are by which we may categorize the dysphonic voice. In a previous chapter, we have discussed categories such as functional/behavioral versus organic. In addition, in the subsequent pages, specific categorical terminology will be provided by which commonly observed voice characteristics used to describe pitch, loudness, and voice quality are described.

Finally, identifying a dysphonia and categorizing it as a particular type is not enough—we must also describe the severity of the disorder/dysphonia. Severity is, perhaps, one of the most (if not most) important factors in determining why the patient has presented himself or herself before us and in determining if our treatment(s) have had any effect on the patient. When gauging the severity of the presenting voice disorder, we must recognize that the voice impairment is (1) multidimensional in nature (i.e., composed of numerous characteristics that may be weighted differently by different judges 8,​ 9) and (2) may impact the speaker’s functional ability to communicate and obtain employment. 9 Several ways in which the severity of the presenting dysphonia can be communicated will be provided.

3.3.2 What Do We Do During a Voice Diagnostic?

As shown in ▶ Fig. 3.1, a number of different evaluation procedures are necessary to completely assess vocal function and arrive at a logical diagnostic decision. Based on time available to the clinician, insurance company regulations, and other factors, it may not be possible to complete all evaluation procedures in a single visit. In our experience, we have typically separated the diagnostic into two sets of procedures that combine to form the complete evaluation (▶ Table 3.1). These sets include (1) an initial exam which includes gathering preliminary information and conducting case history, perceptual evaluation of the voice, and laryngeal visualization and (2) a second examination in which behavioral voice evaluation and laryngeal function study are completed including acoustic and aerodynamic analyses (indirect measures of voice). In this second set of evaluation procedures, a series of trial diagnostic therapy tasks may also be conducted depending on the presenting case. The order of these two sets of evaluation procedures may be reversed depending on the order of referral (e.g., in some settings, the laryngeal visualization examination may occur after the voice evaluation and laryngeal function analysis). In this chapter, we will focus on gathering important preevaluation information, case history format and key components, and perceptual evaluation of the voice. Subsequent chapters will describe the evaluation procedures included in laryngeal visualization, acoustic, and aerodynamic analyses.

Table 3.1 By necessity, the evaluation procedures incorporated into the voice diagnostic protocol are often administered over multiple sessions. The procedures in set I versus set II are those recommended by the authors, but may be varied at the choice of the voice clinician.

The voice diagnostic protocol

Evaluation set I

Evaluation set II

Gather preliminary information chart review

Voice recordings and acoustic analyses for objective measures related to pitch, loudness, and quality

Case history interview

Aerodynamic measures that relate to respiratory phonatory capacity and control

Perceptual evaluation of voice (including auditory-perceptual evaluation and ratings of patient self-perception)

Trial diagnostic therapy tasks (if necessary)

Laryngeal visualization (stroboscopy)

3.3.3 Preevaluation Information

Before meeting with their patient, the clinicians should attempt to gather pertinent background information that may be beneficial in determining the possible etiology and contributing factors to the patient’s possible disorder, as well as review information on previous evaluation(s) and past or current treatment methods that the patient may have received. Immediate access to information about the prospective patient may be affected by the setting in which one works, with the range of initial information extending from simple referral slips with limited information about the patient’s condition to extensive information obtained from medical charts and reports. A review of this preliminary information may inform initial suspicions regarding possible etiological factors, which often helps the clinician to develop a clinical hypothesis regarding a potential voice disorder (e.g., behavioral in nature due to vocal abuse/misuse vs. those associated with an underlying disease process). This hypothesis may be either accepted or rejected following the completion of a voice diagnostic profile. Complete consideration of possible etiological factors is essential, as attempting to treat a patient’s current voice symptoms without a hypothesis of why the voice disorder developed may result in temporary voice change but will be ineffective in reducing the probability of disorder recurrence.

In certain cases, the clinician may already have access to chart history or to a completed case history form. Review of this information should be focused on evidence pertinent to the cause or maintenance of a voice or speech disorder. If the clinician cannot clearly answer why a question is being asked or why a piece of information is being included in a description of the patient’s history, then the question or piece of information should not be included. The following are examples of information that are typically reviewed:

  • Previous diagnoses including postnasal drainage, history of reflux, hypertension, and respiratory disorder.

  • Any systemic disorder that may affect the respiratory and/or laryngeal mechanisms such as neurological disease and cancer of various forms should be noted.

  • Significant injuries to the head, neck, chest regions, as well as any history of surgeries in these regions should also be identified.

  • Surgical procedures that may have involved intubation (the passage of a tube through the nose or mouth into the trachea for maintenance of the airway during anesthesia).

  • The presence of hearing deficits that may be related to vocal dysfunction.

  • The chart history should also be reviewed for the previous physician’s examination note, as well as the results of previous tests and procedures that involve the phonatory mechanism such as modified barium swallow and endoscopy.

It is essential that the clinician maintain an open mind regarding the possible etiological information gathered from the patient’s chart history. While background information facilitates an initial hypotheses regarding the patient’s condition and underlying deficits, the clinician must also be wary of bias toward a particular clinical hypothesis before the actual collection of patient signs (i.e., observed phenomenon by the clinician) and symptoms (i.e., patient descriptions that may not be verifiable) have taken place in the diagnostic session. 1 There are various forms of bias that may predispose the clinician to select a particular diagnosis regardless of the actual data observed. Croskerry described several forms of cognitive dispositions to respond (CDRs) that may lead to diagnostic error 10:

  • Anchoring bias—locking on to a diagnosis too early and failing to adjust to new information.

  • Availability bias—thinking that a similar recent presentation is happening in the present situation.

  • Confirmation bias—looking for evidence to support a preconceived opinion rather than looking for information to prove oneself wrong. This often leads to premature closure in which the clinician has “jumped” to a conclusion regarding the patient’s condition.

  • Diagnosis momentum—accepting a previous diagnosis without sufficient skepticism.

  • Premature closure—similar to “confirmation bias” but “jumping to a conclusion.”

The clinician must balance the possibilities presented by background information with the realities of the actual diagnostic session. The background information should spur the clinician on to develop clinical hypotheses, carry out any necessary research regarding the patient’s condition before evaluating the patient, and prepare any special tests that may need to be carried out (e.g., although we are focusing on voice evaluation, voice disorders may coexist with other communicative deficits that may also need to be evaluated). 1 However, the clinician must always be wary of clinical bias and be prepared for the possibility that the patient may have characteristics that are quite different than background information has led us to believe.

3.4 Key Content of the Case History

The diagnostic session typically starts with the gathering of background information and significant information regarding the patient’s possible voice problem(s) in the case history interview. The following key areas should be explored with the patient 1:

  • The nature of the problem.

  • Development of the problem.

  • Variability versus consistency.

  • Description of voice use.

  • Effects of dysphonia on the patient.

  • Health status and possible causes.

It is our view that, rather than carrying out a “form-filling” exercise, it is best to incorporate these issues into a conversation with the patient/caregiver. A brief conversation that covers most, if not all, of the subsequent areas of questioning not only gathers focused case information in an efficient manner (in our experience, generally within 5–10 minutes at most) but also allows for extended conversation in which the clinician has multiple opportunities to observe the pitch, loudness, quality, and durational characteristics of the patient’s voice, as well as observe the patient themselves (e.g., signs of stress/anxiety, presence of tremors) in a typical conversational setting.

3.4.1 The Nature of the Problem

The starting point of the voice diagnostic interview is often described as an assessment of the nature of the problem (i.e., what is the problem and what are its characteristics). The patient’s description of the characteristics of his or her voice problem(s) allows for a description of the disorder as the patient perceives it. 11 In describing the nature of the problem, the patient may describe the voice problem that will correspond closely to the perceptions of the clinician (e.g., “My voice has had a raspy sound to it for the last few weeks”; “My voice cuts out.”). However, it is also common that the patient’s description of the nature of the problem is at odds with previously gathered case history information and/or the perceptions of the clinician. These discrepancies may be due to factors such as (1) patients misunderstanding of their problems (i.e., reflecting a possible lack of awareness of their disorder), (2) an inability of the patient to deal realistically with their voice deficits, (3) intermittent voice problems with variable characteristics that may not be in evidence at the time of the interview, or (4) a voice problem that has changed considerably since first documented via previous evaluation. 1,​ 11

We suggest that the clinician start the diagnostic interview with a general open-ended question such as “What can I do for you today?” or “Can you tell me why you are here today?” Beginning the interview with this form of question provides the patient with the opportunity to describe the possible voice problem(s) in his or her own words, independent of any clinician bias that may be present. The patient quite often will relate important information that is not present in the patient chart history, and therefore, it is important that we allow the patient the opportunity and time to describe the problems they have been experiencing. In addition, since the use of an open-ended question requires the patient to provide a more extended response, the clinician is provided with the opportunity to observe some of the perceptual characteristics of the voice. Auditory-perceptual judgments regarding the patient’s voice characteristics are often the first signs (i.e., observable and verifiable characteristics) collected by the clinician during the voice diagnostic. As the patient is speaking, the clinician is also provided with the opportunity to visualize features that may be accompanying the patient’s speech/voice characteristics which can provide insight into possible underlying causes or contributory factors (e.g., excessive tension in the paralaryngeal region, limited oral movements, rigidity in the mandibular region, or tremors). In addition, attention to the patient’s communication patterns can inform the development of hypotheses regarding personality traits and the relationship of those traits to the perceived voice characteristics and patient symptoms (e.g., introverted vs. extroverted personality).

While it is hoped that the patient will freely impart information regarding their voice problem(s), some patients will need to be cued toward revealing the nature of their deficits. As an example, the patent may respond to the clinician’s introductory open-ended question of “Can you please tell me why you are here today?” with a very vague “I have been having trouble speaking lately.” The clinician may then have to lead the patient with a question that cues certain necessary responses such as, “Do you have trouble with the way your voice sounds, the ability to form sounds with your lips and tongue, or in finding the right words to say?” Hopefully, this cued question will elicit a more focused response from the patient such as “The way my voice sounds,” followed by a further clinician cue such as “Can you describe for me how your voice sounds when you are having trouble speaking?,” resulting in the patient stating, “It sounds hoarse.” In this example, the clinician guided the patient to a more descriptive and informative response that verifies the probable presence of a voice disorder. If the patient is still unable to provide a reasonable description of the problems he or she has been experiencing, the clinician may also ask if the way the voice sounds today (i.e., during the interview) is the way the voice sounds when he or she is having the voice problems, or ask if the patient can demonstrate the disordered voice. The clinician should also be prepared to demonstrate various voice types themselves (e.g., breathiness and strain) as a means of eliciting the possible nature of the voice dysfunction from the patient.

In addition to the nature of the patient’s specific voice problem, the clinician should be aware that other related symptoms may provide important insight into the eventual diagnosis. Several neurological and stress-related symptoms may be associated with voice problems as etiological factors or as contributing factors. Examples include the presence of possible dysphagia, nasal regurgitation of food and liquids, weakness (either bilateral or unilateral) in other parts of the body, neurologically related speech and language deficits, characteristics of increased musculoskeletal tension, increased fatigue, frequent heartburn, and dryness in the mouth and throat. 1

3.4.2 Development of the Problem

After initial questioning about the nature of the voice problem, a logical transition is to investigate the development of the problem through questions such as “How long have you had this problem?” or “When did you first notice your voice problem(s)?” The onset of the disorder may be characterized as (1) long duration, gradual onset versus (2) sudden onset. Long duration, gradual onset disorders generally do not have a specific date or episode of onset that the patient can recall. Various types of voice disorders (organic and functional) may develop over weeks, months, or years, including those associated with conditions such as vocal abuse and progressive neurological diseases. 12 In certain cases, patients who have had a slow, gradual onset to their voice problem may show less concern about their voice and/or less overall effect on their daily life because they have learned to cope with and compensate for their deficits. Because gradually developing dysphonias are often observed in patients with long duration, habituated vocal abuse, these patients may have a poorer overall prognosis for voice/behavior change. 13

In contrast to gradual onset disorders, those disorders with acute, sudden onset are often more disturbing to the patient, and often pose a severe disruption in both the ability to carry out daily activities and, possibly, overall health status. 13 With sudden onset dysphonias, the patient may be able to describe the date and details of onset with great detail. A variety of conditions with potential negative effects on voice may develop over a very short time (1–2 days or less), including

  • Severe laryngitis.

  • Neurological insult (e.g., cerebrovascular accident, closed head injury).

  • Laryngeal trauma (external [e.g., blunt trauma to the neck region] or internal [e.g., sudden trauma to the vocal fold mucosa from a singular shouting/screaming episode]).

  • Iatrogenic surgical injury (e.g., vocal fold paresis resulting from thyroid surgery. vocal fold ulceration resulting from intubation).

In addition, sudden voice change in the absence of signs or symptoms suggestive of organic pathological condition is often a key component in the diagnosis of psychogenic dysphonias (e.g., conversion reaction). 12

3.4.3 Variability versus Consistency

Information regarding disordered voice characteristics that have been variable or have shown degrees of fluctuation over time versus those that have been relatively consistent may be an important factor in discerning voice disorder type. Disordered voices that periodically return to normal or near-normal characteristics may be functional in nature and may be due to underlying variations in vocal fold swelling and stiffness. In those conditions in which dysphonia fluctuates, it is important to question the patient regarding the conditions that are associated with either positive or negative voice change (environmental effects; effects of vocational, social, recreational situations; specific periods of the day associated with poor vs. improved voice). 1 In addition, factors such as personal habits (smoking, alcohol use), work conditions, or medical conditions may affect the variability of a voice disorder. Periods of emotional stress (personal, familial, work related) may also cause the voice to worsen. Highly variable voice characteristics have been associated with a variety of dysphonic conditions 1,​ 5,​ 12,​ 13:

  • Hyperfunctional patients often report improved voice function earlier in the day, with increasing dysphonia with increased voice use.

  • Voice quality that is worse in the morning versus later in the day may be a symptom of postnasal drip (PND) or laryngopharyngeal reflux (LPR). An accompanying symptom of PND and/or LPR is excessive throat clearing first thing in the morning.

  • Functional voice disorders associated with psychosomatic conditions often report considerable variability in their vocal function.

  • Dysphonias related to vocal fold swelling and stiffness will often be worse as the day progresses and improve with reduced voice use.

In contrast to those with variable voice characteristics, many disorders that have underlying neurological dysfunction or definitive changes in vocal fold structure (e.g., mass lesions) generally result in fairly consistent dysphonias that do not show considerable spontaneous improvement. 11 Of course, there will be exceptions to this view, such as with the case of myasthenia gravis (a deficit of neural transmission affecting the myoneural junction) which may be highly variable over time, with patients showing progressive weakness with muscle use, followed by periods of improvement after rest.

3.4.4 Description of Voice Use

Many voice disorders arise from the manner in which the patient uses the phonatory mechanism (i.e., functional dysphonias). Therefore, it is essential that a comprehensive description of how the patient uses his or her voice in various situations is obtained by the voice therapist. The identification of the patient’s potential vocal abuse, misuse, and overuse in various vocational, social, and recreational settings is essential, with these various situations often the cause of many functional voice problems. 12,​ 13 The clinician must determine the daily vocational and recreational vocal demands on the patient and determine whether the patient must use voice under adverse conditions. The development of voice disorders has been associated with several types of voice use and/or setting, including

  • Work settings that require the patient to use his or her voice for their livelihood (i.e., professional voice users) may potentially lead to the development of voice disorders, particularly if these patients have not had any professional voice training. 12

  • Social settings such as gatherings at parties or in bars may present a vocally abusive environment (voice production in excessive background noise) with potential to elicit phonotrauma.

  • Hyperfunctional phonation that is potentially damaging to the vocal fold mucosa may accompany strenuous exercise and sporting events for both spectator and participant. 12

  • Singing in various situations (choir, theater, recreational musician) presents a potentially phonotraumatic condition (e.g., high-intensity voice in the presence of high levels of background noise; possibly under adverse conditions such as singing in bars in a smoky atmosphere, poor monitors, etc.) for many vocalists, particularly if they have not had professional voice training. 1

Because it is not always possible for the clinician to directly observe the patient in all the aforementioned situations, the patient may be encouraged to describe and possibly demonstrate the voice use in these various settings for the clinician. 13

3.4.5 Effects of Dysphonia on the Patient

Regardless of the clinician-perceived severity of the patient’s dysphonia, a voice disorder that the patient feels affects their vocation or draws negative reactions from others will be of more concern to the patient than a voice disorder that does not adversely affect daily life. We have seen patients with very mild dysphonias who have described potentially drastic effects on their daily lives (e.g., teachers, singers), as well as patients with perceptually severe dysphonias who have demonstrated very little concern (particularly once any potentially life-threatening disorder has been ruled out). Our observations are consistent with Colton and Casper who stated that, “the severity of the (patient’s) reaction is not always proportional to the severity of the voice problem” (p. 190). 12 For those who do describe a debilitating effect on their daily lives, feelings of stress, anxiety, and development of possible negative psychological outlook may arise because of the dysphonia or from the reactions of others. 1,​ 11 These negative psychological reactions can have their own exacerbating effect on any presenting voice disorder.

The perceived effect of a dysphonia on the patient also has implications for prognosis. In many cases, those who describe a particularly adverse effect on their lifestyles may be more motivated to participate in and follow through with treatment goals and recommendations. In contrast, those patients who demonstrate little concern or awareness of their particular voice disorder are often less motivated to follow through with treatment recommendations and may have a poorer prognosis for improvement. When asked about the effects of the dysphonia, some patients may be apprehensive about discussing what may be a potentially humiliating effect on their lifestyle. However, the clinician should ensure that expressions of denial regarding effects of the dysphonia are fully explored. 11,​ 12

3.4.6 Health Status and Possible Causes

Since voice characteristics reflect not only the emotional state and personality of the patient but also the overall physical status, the patient’s current health history, as well as any particularly pertinent information from the preevaluation chart review, should be assessed to determine any possible relationship to the presenting voice disorder. 12 A brief review of current health history should be a part of the case history interview, as complete details of the patient’s health history may either be absent from the patient’s chart details or may have changed since their most recent examination. The voice clinician is particularly interested in health history in the following areas 1:

  • Current health status (including recent illness, injuries, and surgical or other medical procedures).

  • Injuries or trauma to the upper chest, head, and/or neck region.

  • Neurological problems.

  • Respiratory deficits.

  • Allergy-related problems.

  • History of frequent upper respiratory tract infections.

  • Surgeries involving the head, neck, or upper chest region, as well as any procedures in which the patient may have been intubated.

  • Smoking, alcohol use, illicit drug use.

  • Ingestion of caffeinated beverages (coffee, tea, soft drinks).

  • Use of prescription and over-the-counter medications (e.g., antihistamines, decongestants, diuretics).

  • Endocrine imbalances.

  • Previous occurrences of voice dysfunction or “loss” of the voice.

  • Hydration status—degree of internal hydration may affect the viscosity of mucus secretions and aids in lubrication of the vocal fold cover. 14 A well-lubricated cover protects the vocal fold during the vibratory cycle and aids in heat dissipation. Ingestion of six 8-ounce glasses of water or fruit juice per day is generally recommended.

  • Psychological issues with a particular focus on sources of stress and anxiety.

In this stage of our discussion with the patient, we should also ask (if it has not already been stated) what the patient believes may have been the possible cause(s) of the voice problem(s). A comparison of possible causes described by referral sources (e.g., physician, previous speech–language pathologist) versus those possible causes described by the patient may reflect on the knowledge and insight the patient has regarding his or her voice problem. Differing views on the possible cause of a voice disorder between the patient, referral sources, and family members may reflect (1) an inability to adequately understand what may have been previously explained to the patient, (2) the patient’s inability to recognize possible underlying causes of the voice problem, or (3) an inability to accept and cope with the problem. 1 Differences between the patient’s perceptions of what has caused the voice problem versus opinions offered by referral sources may have to be addressed if the patient is to gain awareness and recognition of the underlying cause(s) of their dysphonia and develop a positive prognosis for improvement in voice therapy. 11

3.4.7 Auditory-Perceptual Evaluation of Voice

As we are leading the patient through our case history examination and documenting the aforementioned key areas of questioning, it is essential that we are also listening to and documenting the perceptual characteristics of the patient’s voice. The term perceptual evaluation is recommended by the Voice Committee of the International Association of Logopedics and Phoniatrics (IALP) and entails a comparison between the characteristics of the voice of the speaker and those that are considered as normal or typical for the listener. 15 Since we are focused on the audible characteristics of the voice, many expand this term to auditory-perceptual evaluation. Clinicians and researchers believe that this form of perceptual evaluation is an essential component of voice assessment, diagnosis, and treatment for the following reasons 16:

  1. Perceptual evaluation methods are available to all clinicians and may provide a global measure of vocal performance.

  2. The perceived characteristics of the voice are quite often the reason that a patient has presented themselves or has been referred for the voice diagnostic in the first place.

  3. The perceived characteristics of the voice present a primary gauge by which therapy recommendations will be made and success in therapy will be evaluated.

  4. Measures and observations obtained via acoustic, aerodynamic, and laryngeal visualization will be related and often interpreted in light of the perceptual characteristics of the voice.

Previously, we had stated that a voice is normal/typical if it does not deviate substantially from our internal gauge of parameters such as pitch, loudness, quality, and duration. 1 If the voice that we are listening to deviates from this definition, it is abnormal/atypical. In the next section, we will describe some of the methods by which various voice types and their severity are documented and describe in more detail commonly observed typical and atypical auditory-perceptual voice characteristics.

Methods for Rating the Type and Severity Voice Disruption

Because disordered voice is multidimensional (i.e., composed of numerous characteristics including pitch, loudness, duration, and quality), it is necessary to communicate what category or type of voice disruptions we are observing, as well as the severity of the observed disruption. Of these two components (category/type of vocal disruption vs. severity of vocal disruption), accurate descriptions of severity can be the most difficult for many clinicians. When we judge the severity of a disorder, we are recognizing that the condition may exist along a continuum. This continuum extends in growing proportions from an absence or minor amount of the observed deviant voice characteristic to an extreme amount. The lower end of this continuum should be acknowledged as being a “minimal” level, because even normal voice signals are not necessarily perfect. On the other end of the continuum, an extreme level of voice deviation has a significant effect on patient and listener alike and often prevents phonatory function.

When documenting and communicating the perceptual characteristics of the voice, the clinician should select a method of describing and rating voice characteristics that is complex enough to portray the multidimensional nature of voice quality deviation and yet be understandable enough that (1) clinicians may easily learn and use the system with limited training and (2) results may be communicated easily among colleagues and other professionals. Because the perceptual characteristics of the voice may differ depending on the context in which the voice sample is elicited, it is recommended that the clinician describe voice characteristics in both sustained vowel (e.g., repetitions of the vowel /a/) and standardized speech and/or reading contexts (e.g., portions of “The Rainbow Passage”; counting). These contexts will augment the perceptual descriptions of voice obtained during the case history conversation. Here are three commonly used methods for documenting the perceptual evaluation of the voice:

Categorical Ratings

In this method, voice samples are assigned to discrete categories such as mild, moderate, severe. The following severity terminology attempts to incorporate a number of the possible diverse effects of dysphonia 1:


While the listener experienced in the perceptual characteristics of the disordered voice would consider the voice abnormal/atypical, the lay listener may consider the voice to be only unusual in nature. The voice characteristic is not distracting, and the ability to effectively communicate is not affected. The dysphonia has a minimal effect on phonation.


Dysphonia is more prominent, and both trained and untrained listeners would consider the voice abnormal. There may be intermittent periods in which the voice characteristic is highly distracting. The ability to effectively communicate is noticeably affected under certain conditions (e.g., noisy environments). The dysphonia may occasionally cause substantial disruption to phonation (i.e., phonation ceases or becomes highly effortful).


Both trained and untrained listeners would consider the voice extremely abnormal. The voice characteristic is highly distracting. The ability to effectively communicate is consistently affected. The dysphonia may cause phonation to be mainly absent (i.e., aphonic) or extremely effortful.

Using these categories of severity, the clinician may state (e.g., in the impressions section of their diagnostic report) whether the voice is typical or dysphonic, if dysphonic, the severity of the dysphonia and the type of dysphonia present. As examples, a report may state that, “The patient presents with dysphonia characterized by mild breathiness” or, alternatively, “The patient presents with moderate strain,” where the presence of dysphonia is assumed.

Equal-Appearing Interval Scales

The severity of a perceived voice characteristic is assigned a number (the range of numbers varies), with the higher numbers representing increased severity and perceived deviation from the normal/typical voice. The assumption of these scales is that they are linear with each interval representing an equal increment in the characteristic being measured or described (e.g., the difference/distance between an adjoining set of values [e.g., 0–1] is the same as any other adjoining set of values [e.g., 1–2]). A commonly used example of this type of equal-appearing interval (EAI) scale is the GRBAS scale (Table 3.2). The GRBAS scale was developed by the Japanese Society of Logopedics and Phoniatrics, which gives scores of 0, 1, 2, or 3 for the grade of hoarseness (roughness, breathiness, asthenia, and strain), where 0 is normal, 1 is a slight degree, 2 is a medium degree, and 3 is a high degree. 17 This scale has also been extended to the GRBASI (or GIRBAS) by adding a rating for instability to reflect fluctuation of voice quality over time. 18

Table 3.2 Description of the parameters of the GRBASI scale




Grade: a summary rating of the severity of dysphonia as a whole (i.e., an overall impression of the voice)


Roughness: coarse, gravelly, low-pitched noise. Related to irregularity of vocal fold vibration


Breathiness: airy, whispery voice due to audible detection of airflow through the glottis. Related to hypoadduction of the vocal folds


Asthenia: weak voice


Strain: impression of effortful, hyperfunctional voice. Related to excessive muscular effort and glottal/supraglottal constriction


Instability: fluctuation/variation vocal characteristics such as pitch, loudness, and quality

Notes: Each parameter is rated on a 0 to 3 scale where 0 is normal, 1 is a slight degree, 2 is a medium degree, and 3 is a high degree.

Visual Analog Scales

Instead of scaling the voice by use of specified incremental levels of voice disruption (as in EAI scales), visual analog scales (VASs) provide the judge with an undifferentiated line on which a mark is placed to indicate the level of voice severity or deviation. Generally, only the extremes of the line are labeled (e.g., minimal vs. extreme). The use of this type of scaling procedure may be helpful in reducing bias in the rating process.

An application of the VAS method is found in the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) scale. The CAPE-V was developed to help standardize clinical auditory-perceptual assessment of voice and to describe the severity of perceptual attributes in a manner that would facilitate communication among clinicians. 19 The CAPE-V elicits sustained vowels as well as connected speech productions in both sentence reading and spontaneous speech. This tool provides specific sentence contexts developed to assess different elements of vocal quality:

  • Sentence 1 (“The blue spot is on the key again”) may be used to examine the coarticulatory influence of vowels /a/, /i/, and /u/ on voice quality.

  • Sentence 2 (“How hard did he hit him?”) assesses soft glottal attacks and voiceless to voiced transitions.

  • Sentence 3 (“We were away a year ago”) is an all voiced sentence that allows for the observation of the ability to maintain voicing in a variable context, as well as for the observation of possible voice stoppages/spasms.

  • Sentence 4 (“We eat eggs every Easter”) includes vowel-initiated words that may elicit hard glottal attacks.

  • Sentence 5 (“My mama makes lemon jam”) includes numerous nasal consonants important in certain types of voice therapy (e.g., resonant voice therapy), as well as may be used to assess the presence of hyponasal resonance imbalance.

  • Sentence 6 (“Peter will keep at the peak”) provides a useful context for assessing pressure consonant production and possible hypernasal resonance imbalance or nasal air emission. Voice characteristics are also assessed in conversational speech.

The CAPE-V involves rating overall severity of dysphonia, roughness, breathiness, strain, pitch, and loudness using 100 mm VAS. While the CAPE-V incorporates a VAS, it is actually a hybrid scale since it also includes indicators for mildly deviant (MI), moderately deviant (MO), and severely deviant (SE) ratings. The clinician is asked to observe and describe possible task-dependent differences in vocal performance between the three CAPE-V contexts (sustained vowels, elicited sentences, and conversational speech). A summary of the key content of the CAPE-V is provided in ▶ Fig. 3.2, and the form may be downloaded from

Structure of the CAPE-V perceptual assessment. The examiner listens to elicited sustained vowels, six sentences (read or repeated by the patient), and conversational speech before rating six perceptua

Fig. 3.2 Structure of the CAPE-V perceptual assessment. The examiner listens to elicited sustained vowels, six sentences (read or repeated by the patient), and conversational speech before rating six perceptual domains on a 100-mm visual analog scale. Additional comments on voicing behaviors (e.g., diplophonia, glottal fry, and falsetto) may be added to the assessment. A CAPE-V form can be downloaded from, and may also be obtained from Kempster et al. 19

3.4.8 Commonly Used Terminology to Describe Voice Characteristics

The human voice (and for that matter, any sound) can be described in terms of four perceptual characteristics: pitch, loudness, duration, and quality. These same terms are the key perceptual characteristics of the voice that may become disrupted or which will deviate from our normal/typical expectations when a patient has a disordered voice.


Most people can identify a “high” note versus a “low” note when listening to a musical or singing passage, and this “high” or “low” distinction is based on the perception of the pitch of the voice or instrument that he or she is listening to. 1 Pitch is the auditory perception of the fundamental rate of vibration of some sound-producing source. Because the pitch of the voice is a direct result of changes in factors such as (1) elasticity, (2) mass, and (3) the length of the vocal folds, the voice clinician may be able to develop hypotheses regarding the underlying structural condition of the vocal folds from the perception of the patient’s vocal pitch and his or her knowledge of normal/typical pitch levels. The typical sound wave measurement that corresponds to pitch is the measurement of frequency. In contrast to pitch, frequency is the objective measurement of the fundamental rate of vibration (i.e., a measurement of the number of cycles of vibration per second). The assessment of pitch and frequency is an essential aspect of the voice evaluation.

Perceptual analysis of pitch generally focuses on descriptions of (1) habitual pitch level, (2) pitch variability and stability, and (3) total pitch range. The following terminologies have been used to describe these various aspects of vocal pitch:

Habitual Pitch

The habitual pitch has been defined as the pitch that the patient uses most often in everyday speech 11 and is the pitch level around which normal pitch inflections/variations occur. 20 Boone and McFarlane 13 state that the habitual pitch corresponds to the modal pitch level (i.e., the most frequently used pitch), whereas Case 20 indicates that habitual pitch is synonymous with the average pitch level.

Colton and Casper indicate that the assessment of the habitual pitch level should focus on whether the pitch level is appropriate for the patient’s age and gender. 12 The determination of “normal” habitual pitch level is clearly related to knowledge of several key patient characteristics (age and gender, body size/type, and race) other than the characteristics of the voice itself. 6 When gauging the normality of pitch, it appears that we make a mental/internal comparison between the perceived pitch level and our expectations (gained through experience) for the person’s/speaker’s age, gender, etc. When the speaker’s pitch level does not fall within our expected range, an abnormal pitch level is perceived. In addition, listener’s confusion as to age and gender may also occur.

Various rating scales are available for rating habitual pitch level. As an example, Awan described a seven-point equal-interval scale with the central point of the scale rated as “0” (i.e., no substantial difference from normal expectations), with ratings of high pitch (positive numbers) and low pitch (negative numbers) rated on either side of normal (▶ Fig. 3.3). 1 Definitions for mild, moderate, and severe are those presented previously in this chapter.

Example scale for the auditory-perceptual rating of habitual pitch.

Fig. 3.3 Example scale for the auditory-perceptual rating of habitual pitch.

Alternatively, habitual pitch level may be rated using the pitch scale of the CAPE-V form, with the nature of the abnormality (e.g., abnormally high vs. low) described by the clinician and the severity of the disruption identified on a 100-mm VAS with normal pitch level rated toward the extreme left end of the scale and increasing levels of pitch disruption rates toward the right end of the scale. 19

Pitch Variability and Stability

In addition to the habitual pitch level, the voice clinician should make determinations of the patient’s capability to vary/change pitch levels during speech, as well as the patient’s ability to control the vocal pitch. Normal continuous speech production may incorporate a relatively wide range of pitch variation (a range of approximately 4–10 musical notes/semitones). 21 These pitch variations make up the normal intonation patterns of speech. Intonation patterns include pitch variations that are used linguistically to vary the meaning of utterances. 12

Increases in pitch level may be used to indicate a particularly informative linguistic unit within the utterance or an interrogative/questioning statement; decreases in pitch often accompany unstressed syllable production and the end of declarative statements. The use of pitch variability/intonation patterns allows for more interesting and expressive communication. Continuous speech production that lacks pitch variability (referred to as monopitch—a “single” pitch level) may be perceived as dull and uninteresting. If pitch variability (either too little or too much) draws attention to itself, it may be abnormal. 5 In addition, there is evidence that reduced pitch variability and intonation patterns may decrease the intelligibility of the utterance, because the rise and fall of the normal pitch contour direct the attention of the listener to the content words of an utterance. 22

An important aspect of pitch variability that must be attended to by the voice clinician is the presence of pitch instability or the lack of control of vocal pitch. This type of pitch instability is relatively easily perceived by the clinician (macroscopic) and should not be confused with the relatively microscopic cycle-to-cycle pitch instabilities measured in jitter (see section “Vocal Quality in Normal versus Disordered Individuals” and Chapter 4 of this book). Pitch instability may be perceived as a “shakiness” in the voice or observed as unexpected pitch changes/variations during conditions in which we would expect pitch to remain relatively stable. Three commonly observed forms of pitch instability in which it is most common to observe a rapid pitch change/break upward. This may occur if the speaker is using an inappropriately low pitch during speech, if a patient attempts to sustain a high-pitch modal register phonation that rapidly shifts into falsetto, or may occur during sudden hard onset loud voice productions (e.g., an abrupt shout). 5 Diplophonia is a condition in which the listener may perceive the simultaneous production of two pitches in the voice signal. Diplophonia is attributed to the differential vibration of two different sound sources with different mass/length/tension characteristics. 11 Monsen observed that the diplophonic voice was characterized by alternating pitch periods rather than simultaneous production of different pitches and attributed diplophonia to irregularities in glottal vibration pattern in which alternating glottal periods were slightly different in period or shape. 23 Although diplophonia is being discussed as a pitch deviation, it should be noted that this characteristic often gives the impression of roughness in the voice. Vocal tremor is a form of pitch variability defined as a rhythmic variation in pitch (and often loudness) during conditions in which steadiness of pitch would be expected. This characteristic is most evident during the production of sustained vowels.

Awan described a seven-point equal-interval scale with the central point of the scale rated as “0” (i.e., no substantial difference from normal expectations), with ratings of increased pitch variability (positive numbers) and reduced pitch variability (negative numbers) rated on either side of normal (▶ Fig. 3.4). 1 Definitions for mild, moderate, and severe are those presented previously in this chapter. The clinician should note the type of sample that the description of pitch variability pertains to. It is recommended that ratings of pitch variability as used during intonation be obtained from continuous speech samples. In contrast, the ability to maintain a steady pitch production is best observed during a sustained vowel sample.

Example scale for the auditory-perceptual rating of pitch variability.

Fig. 3.4 Example scale for the auditory-perceptual rating of pitch variability.

Only gold members can continue reading. Log In or Register to continue

Feb 25, 2020 | Posted by in OTOLARYNGOLOGY | Comments Off on The Voice Diagnostic: Initial Considerations, Case History, and Perceptual Evaluation
Premium Wordpress Themes by UFO Themes