of Voice Disorders


Structural dysphonia


Regulatory dysphonia


 Hormonal dysphonia


 Neurogenic (central, peripheral) dysphonia


 Malregulative dysphonia


Structural-regulatory dysphonia




The terms ‘structural’ and ‘regulatory’ allude only to aetiology, and all symptomatic connotations such as ‘hyper-’, ‘hypo-’ or ‘muscle tension’ are avoided (e.g. hyperfunctional dysphonia). It is better not to burden aetiology with superfluous symptomatologies.



4.2.4 Conclusion


The reduction of nonstructural phonatory disorders to the phrase ‘functional dysphonia’ does not convey information that sufficiently describes the aetiology. Instead, it merely defines what something is not, narrowing it to a description of a ‘non-organic’ disorder. In contrast, ‘regulatory’ concretely describes disrupted regulation, be it neurogenic or hormonal. The term ‘malregulative’ is a way of describing faulty regulation or ‘lapses’ of the nervous system in the absence of central or peripheral changes. We therefore propose replacing the aetiologically imprecise and ambiguous term ‘functional dysphonia’ with the term ‘malregulative dysphonia’.


4.3 Parameters of Voice Production Relevant for Clinical Investigation



Ulrich Eysholdt

Describing the functional ability of an organ according to the extremes of its capability is very common within medicine, for example, describing the maximal and minimal angles of stretch/bend of the knee. This extremum principle is also used in voice diagnostics; it does not apply, however, to all diagnostic categories. A healthy voice must be able successfully to perform the communication tasks that are placed upon it. It must therefore be possible to vary the pitch and intensity of the voice sufficiently, the voice must sound acceptable and these tasks must be performed reliably over time.


4.3.1 Fundamental Frequency, Mean Fundamental Frequency of the Speaking Voice and Vocal Frequency Range


A voiced sound consists of a fundamental frequency and a series of harmonics (or overtones) the frequencies of which are integer multiples of the fundamental. Frequency is a physical quantity (dimension: Hz = 1 cycle/s) which corresponds to the psychoacoustic perception of pitch. The concept of a voice being ‘high’ or ‘low’ is widely used, even colloquially.


The fundamental frequency of a voice can be perceived and auditorily determined in spoken language in the same way as from a sung or sustained tone. When a patient is allowed to speak freely, he automatically uses the pitch that is most comfortable for him. With only little experience, the investigator can easily identify the fundamental frequency to within a semitone by comparing the pitch to that produced by a tone-generator. The more strongly a patient is affected by the emotional content of what they say, the more tension there is in their entire body, including in their voice. The mean fundamental frequency of the speaking voice correspondingly increases.


Determining the mean fundamental frequency of the speaking voice with technology is complex and vulnerable to technical error. Because of this, we restrict ourselves to estimating the pitch perceptually, which in general is equal to the mean fundamental frequency of the speaking voice. The mean fundamental frequency of spoken language can only be measured with considerable software expenditure; thus the patient is only allowed to sustain a tone that most closely corresponds to his natural speaking voice.


Errors in numerical pitch calculation may result from the evaluation of the fundamental frequency of a voice signal, which, if they go unnoticed, can lead to inconsistencies within the findings, such as certifying that a man has a female voice. While calculating the power spectrum is unproblematic when using the rapid algorithm of fast Fourier transform (FFT), the identification of the harmonic series sometimes may be erroneous owing to an octave error (usually upwards, which can be explained by simple algorithms incorrectly identifying a prominent first harmonic as the fundamental, whereas subharmonics almost never exist). Plausibility checks by the investigator are therefore essential.


The fundamental frequency is particularly dependent on the physical dimensions of the vibrating structures and resonance cavities and is correlated with body size, age and sex. Babies’ voices have a fundamental frequency of approximately 400 Hz, whereas the voices of children and juveniles are 280–350 Hz. Differences according to sex are apparent following the hormone-dependent voice change during puberty (the mutation). The voice of an adult woman is approximately 200–250 Hz and that of a man 100–140 Hz. Women’s voices tend to sink in pitch following menopause and older men’s voices increase slightly. As a physiological reference, the so-called indifference level can be used. This is a pitch that is generated by a well-balanced action of all muscles involved with an optimal relation of muscular effort and vocal output. It is located at the transition area from the lower to the middle third of the pitch range. To avoid overloading of the speaker’s voice apparatus by permanent upward deviation, this pitch should be kept during speaking at the average or mean speaking pitch or, at least, to which it should be brought back regularly.


The term vocal frequency range or pitch range denotes the interval between the fundamentals of the highest and lowest frequencies (f max, f min ) that a person can sing. The range is usually described with musical terminology and covers just less than two octaves in men and women. An octave encompasses 12 semitones and corresponds in physical terms to a doubling of frequency. The conversion between these units can be performed by means of any pocket calculator, using the following formulae:




$$ \mathrm{Interval}\ \mathrm{in}\ \mathrm{Octaves}={\log}_2\left(\frac{f_{\mathrm{max}}}{f_{\mathrm{min}}}\right)=\frac{\log_{10}\left(\frac{f_{\mathrm{max}}}{f_{\mathrm{min}}}\right)}{\log_{10}2}=3.322\times {\log}_{10}\left(\frac{f_{\mathrm{max}}}{f_{\mathrm{min}}}\right) $$

(4.1)





$$ \mathrm{Interval}\ \mathrm{in}\ \mathrm{Semitones}=39.864\times {\log}_{10}\left(\frac{f_{\mathrm{max}}}{f_{\mathrm{min}}}\right) $$

(4.2)

4.3.2 Dynamic Range and Voice Range Profile


The average rate at which acoustic energy contained in the radiating sound of the voice passes through a unit area is called intensity, measured in watts per square centimetre. For practical reasons in terms of comparing intensities, the intensity level preferred for measurement is the logarithmic dB(A) scale (see Sect. 1.​3). Such measurement enables analysis of the extremely wide range of intensity in relation to the normal threshold of hearing. A calibrated measuring station featuring a sound-level meter (commercially available) and following the manufacturer’s standardised measurement conditions are required. The type of microphone used and the distance between the microphone and the patient’s mouth are of critical importance. The patient must be clearly instructed and well-motivated in order to produce sounds at the limits of their vocal ability.


A minimal amount of energy is required in order to evoke vibration in the vocal folds. A lower amount of energy does not result in vocal sound. This minimal energy level is known as the phonation threshold (L min) and corresponds to an intensity level of 35–40 dB(A) for the normal voice, though this varies individually.


At the other end of the intensity scale, the upper limit of phonation (L max) of the untrained voice is 95–100 dB(A). Both limits L max(f) and L min(f) are frequency-dependent and increase with higher frequency. The difference between these limits




$$ R(f)={L}_{\mathrm{max}}(f)-{L}_{\mathrm{min}}(f) $$

(4.3)
is the frequency-dependent dynamic range.

A diagram of intensity level against frequency shows the upper and lower limits of phonation, inside which is the area known as the voice range profile. A normal voice range profile looks like a diagonal ellipse. A large vocal field is suggestive of a powerful voice and a small vocal field suggests a weak voice. Generating and evaluating a vocal field is highly dependent on experience, and the patient must have a very clear understanding of their task.


The limits of a normal vocal range profile can be taken as:



  • Frequency range ≥1.5 octaves



  • Dynamic range ≥30 dB(A)



  • Smooth edges without significant distortion



  • Mean fundamental frequency of the speaking voice is approximately at the transition between the lowest frequency and the middle third of the frequency range (on the frequency axis)


The terminology in this field is somewhat unclear and redundant, so for the avoidance of error, the following must be emphasised: the vocal frequency range is the interval between the highest and lowest frequencies of the voice, not the extent of the voice range profile.


A voice range profile cannot successfully be evaluated quantitatively, despite many attempts to achieve this. Measuring the area of the voice range profile is not necessarily sensible because (a) both axes of the voice range profile are logarithmic and (b) the edges of the voice range profile are indicated only by discrete points between which interpolation must be used. The specific type of interpolation used (linear or spline interpolation) influences the results as much as does the density of the data points. Measuring the extent of the voice range profile is undermined by fractal geometry, wherein the range of a real area is dependent upon the particular scale being used. Approximating the shape by using model calculations is possible in individual cases but has no clinical value. Instead of quantitative evaluation, the main value of the voice range profile is providing useful information on basic features of a voice at a glance.


4.3.3 Acoustic Parameters


The computational power of a small mobile phone processor is enough to extract an entire series of features from a speech signal almost in real time. A method of digital signal analysis that has been substantially adapted to voice investigations is used. The most-developed software, ‘Praat’ (a Dutch expression meaning ‘talk’), comes from the Netherlands and is a free-to-download open-source product (http://​praat.​en.​softonic.​com/​). Commercial alternatives are expensive but have practical advantages in clinical use because of their simplified user interfaces.


The patient should sing a sustained tone lasting at least 3 s into a microphone (‘sustained phonation’). From the microphone signal the stable vibratory part is cut out (‘windowing’) and presented as an almost-periodic signal, which can be analysed with respect to two groups of parameters: perturbation and breathiness.


4.3.3.1 Perturbation Parameters


Perturbation parameters quantitatively describe how periodic the signal is. If N is the number of cycles contained in the signal and T i is the duration of the ith cycle, then the average period is




$$ \overline{T}=\frac{1}{N}\sum \limits_{i=1}^N{T}_i\kern0.875em \left[\mathrm{s}\right] $$

(4.4)

The magnitude of the difference between an individual cycle and the mean




$$ {\varDelta}_i=\mid {T}_i-\overline{T}\mid \kern1.125em \left[\mathrm{s}\right] $$

(4.5)
is averaged (
$$ \overline{\varDelta}\ge 0 $$
). The jitter (J) is defined as




$$ J=100\ \frac{\overline{\varDelta}}{\overline{T}}\kern0.875em \left[\%\right] $$

(4.6)
(Hollien et al. 1971)

Jitter expresses the mean deviation of the periods from the mean period as a percentage.


There are many other similar formulae expressing jitter besides that of Hollien et al. (1971), and each provides different normal values. The appropriate norms can be found in the software manuals.


A perfectly periodic voice signal has a constant cycle duration 
$$ {T}_{\mathbf{1}}={T}_2=\dots ={T}_{N-1}={T}_N=\overline{T} $$
: consequently 
$$ \overline{\varDelta}=0 $$
and Jitter = 0%. But such an ideal voice signal never occurs. Real voice signals are only almost periodic and have jitter values greater than 0%.


Analogously, the reproducibility of individual volume peaks is termed shimmer. The overall amplitude is averaged, and the deviation of individual amplitudes from the average is calculated and then itself averaged and expressed as a percentage.


Normal and pathological values are:



























 

Normal (%)


Neither-nor (%)


Pathological (%)


Not evaluable (%)


Jitter


<0.6


0.6–1.0


>1.0


≥5


Shimmer


<2.5


2.5–4.0


>4.0


≥25


There are a great number of roughness parameters besides jitter and shimmer, all of which are based on these two parameters. They all have specific uses but cannot be considered in this overview.


4.3.3.2 Breathiness Parameters (Noise Measurements)


Calculating the parameters of breathiness is more complex than calculating those of perturbation. After transforming the time signal into a power spectrum via FFT, it is split into two components, one harmonic and the other nonharmonic. Typical parameters include:



  • Harmonics-to-noise ratio (HNR)



  • Normalised noise energy (NNE)



  • Glottal-to-noise excitation ratio (GNE)


There is no standardised calculation algorithm for these parameters. The related normal values therefore depend strongly upon the measurement software used. Clinical evaluation of these measures makes sense only when:



  • The individual laboratory follows a standardised approach



  • The results can be compared with standard values that have been generated with the same procedure


4.3.4 Maximum Phonation Time


The maximum phonation time (MPT) is the most important clinical parameter of the aerodynamics of the voice. As suggested by its name, MPT is the maximum time for which a patient can sing and hold a tone. As this procedure requires cooperation of the patient, compliance with standardised conditions is necessary in order to obtain a reliable measurement: the patient must practise once before the actual measurement, so that the investigator is certain that the patient has understood the instructions. For the measurement itself, the patient must breathe three times ‘normally’ and then inhale once to their maximum limit before they begin singing. A target tone is unnecessary; it is best if the patient uses a comfortable pitch. Usually it corresponds to the mean fundamental frequency of their speaking voice.


Real measured MPT values can be divided into two categories and one less reliable interim area:


















 

Normal


Neither-nor


Abnormal


MPT (s)


>15


10–15


<10


The diagnostic selectivity of MPT is therefore not especially good.


MPT is closely correlated with the vital capacity (VC) of the lungs. Although the VC is sex-dependent (VC is almost 800 mL lower in women than in men), both sexes have a similar MPT. Because women have smaller larynxes than men and correspondingly small flow diameters, they require less air for phonation. This lower air consumption compensates the lower female VC, meaning that MPT is ultimately similar between the sexes.


The glottal flow rate (GFR) is the volume flow of air expired through the glottis during phonation. Its average can be calculated from the parameters MPT and VC from the formula:




$$ \mathrm{GFR}=\frac{\mathrm{VC}}{\mathrm{MPT}}\kern0.875em \left[\frac{\mathrm{mL}}{\mathrm{s}}\right] $$

(4.7)
Because the VC is sex-dependent but MPT is not, the GFR is differentiated by sex. Values for women are 170–220 mL/s and values for men are 250–300 mL/s.

4.4 Types of Phonation



Ulrich Eysholdt

4.4.1 Definition


Phonation is the process of voluntary sound generation by an airstream through the larynx (from Greek ϕωνή = voice). Involuntary laryngeal sound is not under consideration here. Phonation is possible in both directions of the respiratory airstream, inspiratory and expiratory. The scope of this article is restricted to the expiratory airstream only. In physical terms, the larynx is a tube with a rigid outer wall and soft tissue cover inside. In earlier stages of evolution, the larynx worked as a valve in order to separate respiration from swallowing. Later on, as an acquirement of evolution, the voice function was added and arrived at its highest level in human beings. Phonation makes use of the soft tissue of the endolarynx, which consists mainly of muscles. They are able to change their position, shape and viscoelasticity (stiffness) very rapidly. In that way the muscles vary the cross-sectional area of the larynx as well as the wall impedance and influence the expiratory airstream. Each constriction of this tube is a possible source of sound generation. The narrowest opening, even in its most opened state, is the glottis, the area between the vocal folds (Greek γλώττα = tongue, as in ancient times people suspected the tongue to be the source of voice and speech). However, sound production may even take place more cranially (supraglottal). Subglottal sound production is possible in pathological constriction of the trachea-laryngeal section (stenosis) but does not contribute to the normal or disordered voice and is not considered here. As a rough classification, the produced sound can be categorised as ‘voiced’ or ‘unvoiced’ (Titze 1994). A phonatory sound is called ‘voiced’ when the harmonic components are predominant and form a series of overtones. ‘Unvoiced’ sound production shows no harmonic components; the power spectrum is continuous and corresponds to a noise, sometimes without any frequency dependence.


Voiced articulation refers to vowels and some consonants, while unvoiced articulation is related only to certain consonants. There is no sharp border between ‘voiced’ and ‘unvoiced’, neither perceptually nor analytically.


4.4.2 Sound Sources within the Larynx


An expiratory airstream passing the glottis produces sound from three different sources:



  • Volume-induced



  • Eddy-induced



  • Tissue-induced


The volume-induced sound is generated by the vibrating vocal folds: in the glottal plane, they regularly interrupt the continuous airstream, thus forming a sequence of air pulses in the supraglottal space. This is the most dominant sound component during normal phonation. The equidistant pulses have a fundamental frequency and a series of harmonic frequencies.


The eddy-induced sound results from turbulence near the mucosal surface. Between the superficial mucus and the mainstream, there is a boundary layer of turbulent air, which varies in structure and thickness and leads to a noise component. In normal phonation, eddy-induced sound contributes only little to the voice. By means of digital signal analysis and synthesis, it is possible to eliminate the eddy-induced voice component completely. However, a human listener perceives such computerised clean voice as unnatural and too clear. Obviously, a minimum noise component is necessary for a voice to be accepted as natural. When, on the other hand, the eddy-induced sound is increased, it may even dominate the volume-induced components. In this case the voice sounds altered, rough or breathy. Human auditory perception is very sensitive to such voice alterations.


Any vibrating mechanical structure produces a sound of itself, like a vibrating string or a vibrating wall. The vibrating structures within the larynx are the vocal folds, which generate tissue-induced sound components. Depending on the vibration, the tissue-induced components may be harmonic or not. In general they contribute low energy only, but compared with the noise components, they seem to be necessary for a voice to be perceived as natural. Under pathological conditions, i.e. when tissue may not vibrate, as when being stiffened by a scar, tissue-induced sound vanishes completely (Eysholdt 2014).


4.4.3 Glottal Phonation


Variables that change the phonation in the glottal plane are:



  • The glottal opening



  • The muscular tension of the endolaryngeal soft tissue



  • The subglottal air pressure


The vocal folds form the glottis and receive their shape mainly by the M. vocalis. While the vocal muscle at its ventral end is fixed to the thyroid cartilage, it can be moved and positioned by the arytenoid cartilage at the dorsal end. During respiration, the vocal folds are abducted, and during phonation, they are adducted. The ad- and ab-duction movement is symmetrically performed by the tilting and rotating arytenoid cartilages. These movements are called respiratory mobility.


The opening width is the most important variable for regulating the glottal airstream. There is a continuous transition between maximum opening and closure:


















 

Most open/abducted


../images/307062_1_En_4_Chapter/307062_1_En_4_Figa_HTML.gif


Most closed/adducted


Sound/phonation type


Voiceless/whispering


Breathy/modal/creaky


Glottal stop


Although there are audible differences, there is no clear perceptual boundary between the neighbouring phonation types. The overlap from one type to the next prevents the phonation type from being used as a sharp diagnostic criterion. Different phonation types make use of the same or similar laryngeal mechanisms. Additionally, in European languages the different phonation types transfer emotions, not meaning. For the European voice clinician, that and how these differences are used by different languages in the world to articulate meaningful speech sounds is of only minor importance (Laver 1980).


The voiceless condition with open glottis and relaxed vocal muscles corresponds to breathing: no vibrations are excited. When the glottis remains open and the muscles are stiffened, the resulting sound production is called whispering. The eddy-induced components increase, while the volume-induced sound is low. They can be used as primary signal for articulation; the whispering phonation type is used to express secrecy or confidentiality (Laver 1980).


In general, economic sound production needs adduction of the vocal folds and tension of the vocal muscles (or either of them). The transition from whispering to the normal speaking voice is a breathy phonation, which is the result of a (too) low muscular tonus or a (too) large glottal gap. The volume-induced sound components increase, while the eddy-induced components slightly decrease. The downregulation of the muscular tonus leads to a lower tissue stiffness, thus increasing the vibrating mass and consequently decreasing the fundamental frequency.


Breathy voice differs from a whisper because of the weaker medial compression, less muscular tension and the smaller degree of voicing effort.



Case Study 4.1


A pensioner (72 years old) has suffered from a hoarse and weak voice for several years. His voice is worse in the evening. Voice therapy was not successful. Except for high blood pressure, he has no other diseases. The video shows bowing of the vocal folds due to atrophy of the vocalis muscle. The closed phase pattern is characterised by an oval gap. The voice sound during the examination is breathy and partially aphonic, and only at higher frequencies in falsetto register do the vocal folds vibrate. Closure at higher volume is better but still not complete. The habitual pitch of the speaking voice is at about G4 (200 Hz), which is too high for a male voice. The voice sound is mildly breathy (R0 B1 H1).


For normal speech the modal phonation type is used, which means a neutral mode, economical in the sense that it requires least vocal effort. During modal phonation, subglottal pressure and muscular tension build up a dynamic equilibrium, each at moderate values, thus exciting the vocal folds to a sustained vibration. The vibration is a rotational motion in three dimensions, which can roughly be compared to that of a skipping rope with its maximum amplitude in the middle third. During a regular vibration pattern, the glottal gap is completely closed (except for a dorsal chink in women). The glottal vibration interrupts the expiratory airstream and leads to a predominantly volume-induced sound. Modal voice requires an anatomically normal larynx and a normal innervation. The reverse argument is not true: a larynx that is normally configured and has normal innervation is definitely able to produce nonmodal phonation.



Case Study 4.2


A 26-year-old man with a normal male voice wants to apply for singing studies. He has no complaints and strives to develop his bass baritone voice.



Case Study 4.3


A 26-year-old woman with a normal female voice wants to become a music teacher. The organic and functional findings do not give rise to any reservations.



Case Study 4.4


A 48-year-old owner of a record store complains about pain when speaking, a lump in the throat, hoarseness after slight vocal load, dry mouth, a salty taste and the feeling as if having drunk a hot liquid. Speaking is strenuous. It causes pain in the chest and the bronchi, sometimes only the next day. Otherwise the patient has no complaints. There is an obvious impossibility to phonate with a clear voice during stroboscopic examination. The vocal folds are pressed against each other which leads to a creaky phonation. Additionally, the video reveals a very slight atrophy of the right vocal fold and, for compensation, a hardly noticeable crossing of the vocal processes, the real so-called arytaenoid crossing (between 9 and 15 s of the video). When recording the voice range profile, he refused to produce loud tones. His speaking voice is almost normal (R1 B0 H1).


Creaky phonation is also called vocal fry. Characterised by a very low frequency at high subglottal pressure, it is the result of low stiffness in the vocal muscle and high adduction forces that close the glottis completely. Creaky phonation shows irregular periods (i.e. has no fundamental frequency) and has a comparatively low intensity. The voice production is ‘uneconomic’. This voice sound in general is not used for communication but for performing art purposes (Titze 1994). Sometimes the falsetto (from Ital. falso = ‘false’) is considered as a phonation type. The larynx is antero-ventrally tilted and the vibrating mass reduced by stretching the vocal folds. The fundamental frequency is usually remarkably higher than in modal voice. As the falsetto mechanism makes use of the whole vocal tract instead of the laryngeal soft tissues, it is considered here as a vocal register (see Sect. 4.5) rather than phonation type.


4.4.4 Supraglottal Phonation


As sound can be generated at any constriction of the vocal tract, the same mechanisms described for the glottis apply to supraglottal structures, where the most important sound source are the ventricular folds. Although they contain muscular structures, their control is quite imprecise. The result is usually an irregular, very rough voice without audible fundamental frequency that sounds deep and strange. While some artists make use of this special sound (most famous: Louis Armstrong), this type of phonation is of importance for patients after a laryngeal operation (cordectomy). In this situation, a scarred vocal pseudo-fold is formed, which, owing to its stiffness, cannot take part in sound production. Then supraglottal phonation can replace the glottal and enable the patient to articulate speech sounds without any external device (Eysholdt 2014).



Case Study 4.5


A 40-year-old woman had an idiopathic subglottal stenosis that started 8 years ago. She underwent altogether eight interventions with glottal and subglottal enlargement. The vocal folds lost their function in the course of this treatment. She developed a voice that is produced by her ventricular folds. The vocal range is markedly reduced. The voice is soft and severely hoarse with typical pauses of audible inspiration when breathing in between the phrases.



Case Study 4.6


A 46-year-old singer and professor of pop vocals demonstrates the distortion of vocal fold vibration by adducting the ventricular folds (video clip). In the case the ventricular folds contact and start to vibrate (not demonstrated here), the characteristic supraglottal phonation results, which has become the distinctive vocal sound of Louis Armstrong as mentioned in the text (audio clip).


4.5 Vocal Registers



Matthias Echternach

4.5.1 Introduction


The frequency ranges of voices are not consistent entities. At different locations in the frequency spectrum, the voice may change its perceptible character rapidly. An obvious example can be observed in male voices which imitate the female voice, where the voice not only changes the fundamental frequency but also changes its perceptual character. In untrained voices in particular, a rise of fundamental frequency is often associated with sudden uncontrolled pitch jumps to higher frequency regions (Fig. 4.1).

../images/307062_1_En_4_Chapter/307062_1_En_4_Fig1_HTML.png

Fig. 4.1

Acoustic spectrum with a sudden pitch jump reflecting a register shift from modal to falsetto and back to modal register in a male subject. H1–H5 refer to partial of the radiated sound spectrum at the mouth


Regions with similar sound characteristics are, in analogy to organ registers, commonly denoted as vocal registers. The region where register transitions would usually occur is often described as the passaggio region.


The description of the mechanisms underlying vocal registers has been one of the more discussed issues of voice physiology. However, there is as yet no agreement on the underlying mechanisms or the number of registers. Furthermore, there is no agreement on the definition and the related terminology of the term register.


Manuel Garcia II in the 1840s provided one of the early definitions of registers:




Par le mot registre, nous entendons une série de sons consécutifs et homogènes allant du grave à l’aigu, produits par le développement du même principe mécanique, et dont la nature diffère essentiellement d’une autre série de sons également consécutifs et homogènes, produits par une autre principe mécanique.


Garcia (1847)


Translation as published by Henrich (2006) in accordance with Miller (2000):

By the word register we mean a series of consecutive and homogeneous tones going from low to high, produced by the same mechanical principle, and whose nature differs essentially from another series of tones equally consecutive and homogeneous produced by another mechanical principle. All the tones belonging to the same register are consequently of the same nature, whatever may be the modifications of timbre or of the force to which one subjects them.


This definition is based on the concept of a uniform mechanical (laryngeal) principle of vocal registers. Indeed, for some registers, i.e. modal and falsetto, it has been shown that there are strong differences in oscillation patterns, tensions and laryngeal muscle activities.


However, many scientists expect that the perceptual differences between registers are not just related to the voice source but suppose that there could be interactions of neighbouring systems, such as the subglottal and vocal tract cavities, to the airflow or the vocal fold oscillations. Therefore an isolated definition of laryngeal mechanisms which neglects the rest of the voice production system seems inappropriate to many scientists who prefer a more perceptual definition of registers.




The term register has been used to describe perceptually distinct regions of vocal quality that can be maintained over some ranges of pitch and loudness.


Titze (1994)


In addition to these definitions, many more complex definitions have been published, which include breathing patterns, acoustic properties, etc.


As a result of the number of different definitions and varying expectations about the underlying mechanisms, there is still no uniform terminology for the different registers. In particular there is no agreement on the question of how many registers actually exist. Terms associated with registers are related to different outcome criteria. For example, the register in which most humans’ speaking voice is located could be called ‘M1’ if the reader follows a laryngeal oscillatory description, where M stands for the mechanical principle. Also in the same way, the term ‘lower thick’ could be used, which is related to the morphology of the vocal folds. The same register could be called ‘chest’, ‘Brust’ or ‘poitrine’, which would relate the term to the awareness that the vibrations felt during the sound production are often experienced in the chest region. Lastly, the register could be denoted as modal register which refers to physical properties of the oscillatory system.


More confusion is caused by the fact that different professions prefer different terms. Different languages also contribute to the lack of a uniform terminology. Furthermore, sometimes the same term is used to describe different registers. In this respect the term head register is sometimes used to describe a stage voice above passaggio for male singers, while the same term is sometimes used synonymously with the male falsetto register. For female voices the term is often associated with pitch, being the first register occurring above the modal register. However, some people also use ‘head’ for the second register to occur above modal register (above a middle register), a register denoted by others as the upper register.


As a consequence of these ambiguities in terminology, as early as 1963, Mörner et al. (1963) found more than 100 terms for registers. Attempts to unify terminology have so far failed. The author explicitly emphasises that the terminology used in this section is not definitive and is used only to provide, for the reader, the possibility of comparing data.


The following description of the characterisation of registers is organised in order of the frequency range of the registers (low to high). Since many studies have focused on differences between different registers, in some cases characterisations of upper registers are referred to during the discussion of a lower register. However, a detailed description of such registers is offered later in the section.


4.5.2 Pulse Register


The term pulse register (related terms: vocal fry, creak, M0, #1, Strohbassregister, Kehlbassregister) describes the register in the very lowest part of the frequency spectrum of the human voice. This register is used only rarely in humans, and even then it is mostly related to speech but not singing. Nearly all descriptions and characterisations in literature are related to male voices. In this register it is thought that many different mechanisms such as double or triple impulses can occur. The voice signal can be periodic or aperiodic. It has also been shown that there can be an increase of perturbation measures, such as jitter and shimmer.


However, apart from the physical processes involved in voicing at these low fundamental frequencies, the perceptual pulse-like nature of this register might have another cause. In this respect Titze (1994) noted that for complex tones below a fundamental frequency of 70–80 Hz, single sound waves can be perceived by the human auditory system as a series of pulses. Therefore, from the perceptual aspect of register concepts, the underlying reason for the pulse register is not related to the sound producer but more to the listener.


4.5.3 Modal Register


The modal register (related terms: chest, heavy, #2, M1, Brustregister, poitrine) is the register where the speaking voice is usually located. It has been shown that the vocal folds oscillate along the whole length of the membranous portion. There are strong oscillatory amplitudes and mucosal waves. The closed phase is much longer than that within the next higher register. This is also reflected in a greater electro-glottographic contact quotient for the modal register. From electromyographic studies, it is thought that vocalis muscle activity dominates, whereas for the next higher register, the cricothyroid muscle dominates. As a consequence there is a greater mass in action for the modal register. The medial surface of the vocal folds is much greater than that in the higher registers.


The voice source spectrum is characterised by the strong intensities of overtones. In this respect the difference in level between the first partial (ƒ o or H1, with H standing for the harmonic) and the second partial (2ƒ o or H2) is smaller than that for the higher register. Subglottal pressure is also higher in the modal register than in the pulse register.


The transition from the modal register to the next higher register—falsetto for male and middle for female voices—is frequently called the first passaggio.


4.5.4 Male Falsetto


The male falsetto (derived from falsus, false, as to the auditory impression, male voice sounding like female) is the register above the modal register. It has been shown that the oscillatory amplitudes and mucosal waves are smaller, and the open quotient greater, than in the modal register. Cadaver and electromyographic experiments suggest that, in contrast to the modal register, the cricothyroid muscle dominates the vocalis muscle in this register. The resulting sound and subglottal pressures are lower, and the intensities of overtones weaker, for falsetto than those of the modal register.


As a consequence of these acoustic characteristics, falsetto is only rarely used for singing on stage. However, in the case of a counter-tenor’s singing, it is thought that a greater closed phase is achieved through the greater degree of vocal fold adduction for stage falsetto. Furthermore, vocal tract shapes differ between a naïve falsetto and stage falsetto (Wendler et al. 1985).


4.5.5 Side Note Concerning Higher Female Registers


In the literature the concepts of female vocal registers above the modal register are very contradictory. There are some scientists who suggest that there is a second register shift about one octave above the transition from modal to the middle register, whereas other scientists deny the existence of such a passaggio. Nevertheless a second passaggio of this kind is often confirmed by many female professional singers in clinical practice. If such a passaggio is assumed to exist, then the lower register could be denoted as middle and the upper as upper register. Miller (2000) postulated that a register shift around 500 Hz occurs because at that point, the fundamental frequency reaches the frequency of the first formant. Titze (1994) also found some evidence for a register transition in this frequency range. He hypothesised that subglottal resonances would amplify frequencies around 500 Hz, which would lead to vocal fold instabilities.


It should be noted that some authors refer to the upper register as being the register for the very highest fundamental frequencies; in this section this is referred to as the whistle register.


4.5.6 Middle Register


The middle register (related terms: head, Mittelregister, #2a) is the register above the modal register and below the upper register. Usually the frequency range for this register is between 300–500 and 500–800 Hz. In contrast to the modal register, the longitudinal tension of the vocal folds is increased, and the oscillatory amplitudes and mucosal waves are decreased. There is also a greater open phase during the glottal cycle. Using laryngostroboscopy, Svec and co-workers (Svec et al. 2008) described less adduction of the arytenoid cartilages, resulting in a small persistent gap in the posterior part of the vocal folds.


In comparison with the modal register, the middle register is associated with weak overtone intensities, a consequence of which is that the fundamental frequency intensity is relatively strong.


4.5.7 First Passaggio


The first passaggio is the region in the frequency spectrum where the transition from modal register to falsetto or middle register usually occurs, at around 300–500 Hz. In this area the modal and middle registers overlap, and therefore any particular fundamental frequency in this region could be produced in both registers (amphoteric sounds). The frequency range of the female passaggio is only very slightly higher than that of the male passaggio. Many scientists believe that this transition is primarily a laryngeal event. Indeed, when vocal tract shapes were analysed by means of dynamic real-time MRI, a recent study failed to show major differences between the modal and falsetto or middle registers (Echternach et al. 2010). Equalisation of registers, therefore, should be related to a gradual change in oscillatory patterns (Fig. 4.2).

../images/307062_1_En_4_Chapter/307062_1_En_4_Fig2_HTML.jpg

Fig. 4.2

Example of a register transition from modal to falsetto register in a professional singer without a clear sudden register transition event in the audio or electro-glottographic (EGG) signals. Here the voice is starting in modal register. At 24.9 s the subject starts to change registration to the falsetto register, as demonstrated by the change of the electro-glottographic contact quotient (CQ). The change takes about 300 ms and therefore about 100 glottal cycles. At the same time, no major changes of fundamental frequency (F0) or changes in the radiated spectrum at the mouth were present. Taken from Echternach et al. (2012). Copyright with kind permission from Elsevier


4.5.8 Upper Register


Depending on which of the different classifications of registers is used, the frequency range for the upper register (related terms: light, #3) differs greatly. Nevertheless, some authors describe this register at a range of 700 to 1000–1100 Hz.


The reason for the existence of this register is not understood in detail. However, some authors have observed changes in vocal fold oscillation patterns, meaning that the oscillatory amplitudes and mucosal waves are small in relation to the middle registers. In videokymographic studies, Svec and co-workers (Svec et al. 2008) observed persistent gaps during glottal closure. Furthermore they showed shorter opening and closing phases.


From an acoustic standpoint, the fundamental frequency is relatively strong in the upper register compared with that of the middle register. It has also been shown that strong vocal tract shape modifications occur in this register. Furthermore, at least for professional voices, a formant tuning strategy (Sundberg 1975) is often observed in the upper register.


4.5.9 Second Passaggio


As indicated in the side note above, the passaggio between the middle and upper registers is still a matter of discussion. On the one hand, some authors do not believe in such a passaggio at all. For those who do consider it to exist, its cause remains unclear. It has been shown in some studies that, even when the vowel condition remains the same, the vocal tract shape is modified for fundamental frequencies higher than 750 Hz (Echternach et al. 2010) (Fig. 4.3).

../images/307062_1_En_4_Chapter/307062_1_En_4_Fig3_HTML.jpg

Fig. 4.3

Mid-sagittal vocal tract shapes from real-time MRI data for the pitches C5# and G5#. As can be seen, the tongue position differs markedly. Furthermore, the larynx is elevated and the lips are opened for G5#


If associated with a register transition, as expected by Miller (2000), who suggests that the transition is caused because the fundamental frequency reaches the first formant (F1), it could be argued that the register transition should occur at different frequency regions for the different vowel conditions, e.g. lower for vowels with low F1, such as /u/ or /i/. To the best of the author’s knowledge, however, this register transition is thought to be at a nearly stable frequency. However, there are also studies suggesting a laryngeal event as cause of the second passaggio. Svec et al. (2008) observed two qualitatively different transitions above the first passaggio: one around 670 Hz associated with sudden pitch jumps and another at 1000 Hz where the sound pressure amplitudes and intensities of overtones decreased (Svec et al. 2008). Many singing pedagogues and some scientists think that the change of vowel quality might be a means to achieve register equalisation.


4.5.10 Whistle Register


Many singers and scientists commonly notice an additional register transition in the frequency range of 1000–1200 Hz. Voice production above these fundamental frequencies has been the subject of much scientific debate for many decades. Competing hypotheses have been postulated, including sound production in analogy to whistles, turbulence formed from vocal tract/voice source interactions, and a flageolet-like mechanism and modification of the airflow by oscillating vocal folds. In a recent study with transnasal high-speed endoscopy at a frame rate of 20,000 fps, it was shown for a single professional singer subject that at fundamental frequencies up to 1568 Hz, the vocal folds oscillated and closed completely during the glottal cycle (Echternach et al. 2013) (Fig. 4.4).

../images/307062_1_En_4_Chapter/307062_1_En_4_Fig4_HTML.png

Fig. 4.4

Representative images from laryngoscopic high-speed material representing one glottal cycle in relation to a glottal area waveform. The pictures refer to voice production at a fundamental frequency of G6 (1568 Hz). Therefore one glottal cycle is related to the time of 0.64 ms. Taken from Echternach et al. (2013). Copyright with kind permission from the Acoustic Society of America


Endoscopic and MRI data for the register shift from upper to whistle register showed, in a single subject, no major changes in vocal tract shape and consequently no change in formants (Echternach et al. 2015). However, tuning strategies might differ between individuals at these high fundamental frequencies (Garnier et al. 2010). There is a need for further research, which may be provided by using the newest high-speed imaging techniques.


4.5.11 Belting as Register Function


In musical theatre a singing technique called belting is often used (Sundberg et al. 2012). It has been shown that belting exhibits differences in all aspects of voice production. It is often assumed that belting is associated with a loud voice and high subglottal pressure, but it has been found that the voice source fundamental is stronger (i.e. that the intensity for H1 is greater than that of H2) for the female classical voice than a heavy belt. The closed quotient and the electro-glottographic contact quotient are also greater in belting.


Many scientists believe that this vocal technique is related to a register function. In this respect, it is assumed that belting is considered an extension of the modal register to higher fundamental frequencies across the first passaggio. However, it should be mentioned that belting implies not only registration but also concerns aesthetic characteristics. Furthermore, it should be noted that there are many substyles of belting that might also contribute to different hypotheses and conclusions (Sundberg et al. 2012).


4.5.12 Yodelling


Yodelling is a special kind of phonation that is common in many cultures across the world. In addition to special vocalisation of syllables and ornaments, yodelling is characterised by sudden pitch changes, and it is often assumed that these pitch changes are accompanied by register changes. Pitch jumps occur very suddenly in yodelling, but in contrast to untrained voices, professional yodelling subjects show very precise placements of fundamental frequencies and well-defined changes in vocal tract shapes. Therefore, these pitch changes seem very coordinated. Additionally, perturbation measures such as jitter and shimmer are lower for professional yodelling subjects. Interestingly, for female voices yodelling only involves the change from modal to middle register and vice versa but not transitions to higher registers.


4.6 External Laryngeal Muscles and Their Role in Voice Production



Erkki Vilkman

4.6.1 External Laryngeal Frame in Voice Production


The hyoid and laryngeal structures are connected to each other and to other adjacent structures with spring-like attachments (ligaments, muscles) within which a state of equilibrium appears to exist. From a biomechanical point of view, the external mechanisms include not only a great number of muscles but also the often overlooked tracheal pull. Tracheal pull, i.e. the inferior force produced by the mass and tension of the trachea, is transmitted to the larynx through the cricoid cartilage and adjacent soft tissues. This complex interactive system is called the external frame of the larynx (Sonninen 1968).


The external muscles considered to be capable of contributing to voice production include the strap muscles, i.e. the sternothyroid (ST), the hyothyroid (HT) and the sternohyoid (SH) muscles, of which the ST and HT, together with the inferior pharyngeal muscles, i.e. the cricopharyngeal (CP) and the thyropharyngeal (TP) muscles, are directly connected to the larynx. The suprahyoid muscles, i.e. the digastric, the mylohyoid, the geniohyoid (GH), the hyoglossus and the genioglossus muscles, as well as the infrahyoid muscles (SH, omohyoid), have an indirect effect on the larynx. Other indirect muscular forces include those produced by the palatal, oesophageal and nuchal musculature, of which the palatopharyngeal muscle also has a partially direct connection to the larynx. As regards vocal fold biomechanics, it is important to note that with the exception of the CP muscle, the forces produced by the extrinsic laryngeal muscles act directly on the thyroid cartilage (see Vilkman et al. 1996 for a review). A detailed description of the anatomy of the laryngeal region as well as vocal tract acoustics is not within the scope of this article (but see the scheme in Fig. 4.5 and Sects. 1.​2 and 1.​11).

../images/307062_1_En_4_Chapter/307062_1_En_4_Fig5_HTML.png

Fig. 4.5

Schematic representation of external laryngeal forces. (a) Forces lengthening the vocal folds and raising F0. (b) Forces shortening the vocal folds and lowering F0. CP cricopharyngeal muscle; ST sternothyroid muscle; TP thyropharyngeal muscle; AE aryepiglottic muscle; HM hyomandibular muscles; HT hyothyroid muscle; SH sternohyoid muscle. Taken from Vilkman et al. (1996). Copyright with kind permission from Elsevier


4.6.2 Folding and Unfolding of the Laryngeal Structures


The deformation of the laryngeal tract associated with, e.g. pitch lowering, swallowing and effort closure of the larynx, should be considered a result of the folding of the laryngeal walls due to vertical changes in the structures, especially the approximation of the thyroid and hyoid cartilages, rather than a true sphincter action (Fink and Demarest 1978). In cadaver larynges the changes in the laryngeal structures caused by an increased hyoid-to-thyroid distance can be summarised as follows: the epiglottis rises, the vestibule of the larynx expands and the ventricular folds and the vocal folds abduct owing to a reduction in the folding of these structures with increasing vertical tension. This phenomenon is accompanied by a slight increase in F0 (Vilkman and Karma 1989). An example of ‘unfolding’ from everyday clinical practice is the examination technique for indirect laryngoscopy. In order to improve the view of the larynx, the relatively high-pitched vowels /i/ and /e/ and tongue pull are used. Vowel articulation modifies the laryngeal structures considerably. Non-phonatory laryngeal functions are, to a great extent, based on reflex interplay between relevant structures. It is obvious that in some voice disorders, the phylogenetically young phonatory mechanism is more or less replaced by the archaic mechanism. In hyperfunctional dysphonia, for instance, the position of the larynx is high and the space of the laryngeal vestibule is small (e.g. Wendler and Seidner 1987). This change can be explained as occurring because of an approximation of the thyroid cartilage to the hyoid bone and increased folding, as observed during swallowing and effort closure.


External forces together with the cricothyroid system may contribute to the cross-sectional shape of the vocal folds. This is important, for instance, from the point of view of controlling registers and modes of phonation. Vertical changes in the laryngeal column may, in addition to slight fundamental frequency (F0) changes and abductory-adductory influence, also affect the register control of phonation (Wendler and Seidner 1987).


4.6.3 Length Adjustment of the Vocal Folds


For untrained singers especially, the correlation between pitch and laryngeal height is very strong. This is probably why the contribution of external laryngeal muscles has traditionally been suspected to play a major role in pitch control. During recent decades, however, research has revealed that in trained singers, the laryngeal height is not positively correlated with pitch, and the correlation can even be negative (Shipp and Izdebski 1975). The most important means of pitch control is connected with adjusting the length of the vocal folds, in which intrinsic muscles, notably the cricothyroid (CT) muscles, have an important role. As far as vocal fold elongation is concerned, the CT does not necessarily need assistance from the external laryngeal muscles. As to shortening of the vocal folds, it is plausible that tracheal pull (Zenker and Glaninger 1959) with controlled CT muscle relaxation is the main control mechanism (see Vilkman et al. 1996 for a review). A model based on an extensive literature review (Vilkman et al. 1996) of how external mechanisms might influence the pitch is presented in Fig. 4.5. Pitch may be raised (Fig. 4.5a) by a forward movement of the thyroid cartilage that is caused by the activity of the strap muscles and the musculature of the bottom of the mouth; in this case, the CP muscle pulls the cricoid cartilage posteriorly. The cricothyroid joint generally permits gliding in an anteroposterior direction to some extent. Figure 4.5a also illustrates that the TP may contribute to the lengthening of the vocal folds by approximating the thyroid laminae (Zenker and Zenker 1960). Owing to changes concomitant with ageing, this mechanism is probably possible in young persons only. The option for pitch lowering can be summarised as follows (Fig. 4.5b). The lowering of the thyroid cartilage caused by ST and SH activity creates circumstances in which the contraction of the CP muscle, by opposing the cranial displacement of the larynx, rotates the cricoid cartilage around the axis of the CT joint, and thus the vocal folds are shortened (Fig. 4.5). Simultaneous CT relaxation, or at least CT deactivation, is also necessary. In addition, tracheal pull, although diminished because of a lowered larynx position, may contribute to the opening of the CT visor. Assisted by contractions of the SH and TH muscles, these events pull the hyoid bone towards the thyroid cartilage, which leads to changes in the laryngeal column, to increased folding in particular, as often seen in laryngoscopic examinations. In addition to the effects brought about by the hyoid-to-thyroid approximation, the function of the aryepiglottic muscles may also contribute to F0 change by pulling the arytenoids forward (lowering the epiglottis) or by rotating the cricoid cartilage.


4.6.4 Laryngeal Height in Voice Pedagogy and Therapy


In both speech therapy and singing pedagogy, maintaining a low position of the larynx has traditionally been considered favourable for good quality voice production. Clinical and pedagogical experience suggests that the physiological basis of this concept is related not only to resonatory and muscular tension aspects in general, but probably also to an intentional avoidance of excessive vocal fold folding, such as observed in an effort type of glottal closure in connection with a TH approximation. Considering the external factors and changes in the laryngeal column, there seem to be some grounds for a tentative formulation of a basic principle probably underlying the laryngeal control of voice production as follows: in good quality voice production, the intrinsic laryngeal muscles have to be allowed maximum independence in the delicate control of the stiffness and cross-sectional shape of the vocal folds. In this framework, increased folding, due to either articular rotation in the cricothyroid joint or hyoid-thyroid cartilage approximation, would be a violation of an important laryngeal setting. The high larynx position in high-pitched singing probably makes delicate adjustments, such as the so-called covering of the voice (e.g. Sonninen et al. 1999) with rising pitch in classical singing, difficult or even impossible, because of the circumstances created in the vocal fold tissues.


4.6.5 Conclusion


To summarise, the ‘physiological larynx’ has to be considered a complex interactive system, which is not restricted to the anatomical structures of the larynx. Some of the significant factors in this interactive system include tracheal pull and its interaction with CT activity, the gross vertical movements of the larynx, the interplay between thyroid cartilage biomechanics and CP muscle function and finally the sparse muscular connections to the cricoid cartilage.


4.7 Epidemiology of Dysphonia



Ekaterina Osipenko, Krzysztof Izdebski and Amélie Elisabeth Tillmanns

4.7.1 Introduction


In a modern society, voice is an important tool for work and for social communication. Serious social and economical consequences occur when voice is injured. Identification of possible causations, risk factors and prevention of voice injury, clinically referred to as dysphonia, are of a paramount importance to the society at large (Duff et al. 2004; Bhattacharyya 2014). Treatment of dysphonia can be complicated, lengthy and costly and interrupt social and professional life. Hence knowledge of the epidemiological factors is of significant value in understanding the risk factors and improving prevention and treatments of dysphonia (Morabia 2004; Mayou and Farmer 2002; Hill 1965). However, epidemiological data about voice disorders are still rare and studies are just beginning to emerge. Additionally, nonstandardised definitions about the different forms of dysphonia make data collection and analyses even more challenging.


Traditionally, voice disorders are divided into organic/structural dysphonia and functional/malregulative dysphonia (see Sects. 4.1 and 4.2). However, the relationships and boundaries between organic and functional are not always clear. For example, an organic voice problem may be associated with secondary functional voice disorders or vice versa, and functional voice disorders may lead to organic lesions. The definition and sub-classification of functional voice disorders have been discussed for a long time. Correlating specific dysphonias to specific epidemiological factors is often complicated (Duff et al. 2004; Verdolini and Ramig 2001; Roy et al. 2004; Mayou and Farmer 2002). Another confusing factor is revision of the initial diagnosis, which occurs in up to 3% of all cases (Cohen et al. 2012).


In this section, we present literature about global findings on dysphonias with regard to age, sex, professional subgroups, risks and environmental and geographical/linguistic factors. Because providing epidemiological evidence on all types of dysphonia is beyond the scope of this section, discussion is limited to selected dysphonias. The many different forms of dysphonia are listed and explained elsewhere (see Sects. 5.​15.​16).


According to new large-scale epidemiological studies, up to almost half of the general population experiences voice disorders at least once in the lifetime (Roy et al. 2004, 2005, 2007; Behlau et al. 2012; Bhattacharyya 2014; Cohen et al. 2012). While the epidemiological characteristics of dysphonic cohorts appear to be universal, meaning that the epidemiological factors are non-specific for geography, race, language or social status, a higher risk for certain professions is, however, observed (Behlau et al. 2012; Villanueva-Reyes 2011; Roy et al. 2004, 2005; Koufman and Blalock 1982; Izdebski et al. 2000; Duff et al. 2004; Verdolini and Ramig 2001; Carding et al. 2006). A sizable number of these voice disorders comprise acute, short-lasting voice problems, but between 20 and 60% of dysphonias that last longer than 4 weeks are also reported (Roy et al. 2005, 2007). Various reports also list a higher prevalence of dysphonias in women than in men (Roy et al. 2005; Garcia Martins et al. 2015; Behlau et al. 2012; Cohen et al. 2012; Bhattacharyya 2014).


Voice disorders occur at all ages. In childhood, the incidence is estimated between 0.12 and 38% with a higher prevalence in boys (Carding et al. 2006; Verduyckt et al. 2011; Cohen et al. 2012). Newer studies list vocal cord nodules, cysts and acute laryngitis as the main reasons for acquired juvenile dysphonias (Garcia Martins et al. 2015). In adults, functional dysphonia, acid laryngitis and vocal polyps are reported most frequently (Garcia Martins et al. 2015). Around 20% of the geriatric population suffers from voice disorders, often co-occurring with hearing loss, and increasing age is considered to be a risk factor for dysphonia (Cohen and Turley 2009; Cohen et al. 2012; Roy et al. 2007). Besides presbyphonia, functional dysphonia and Reinke’s oedema occur often in the elderly (Garcia Martins et al. 2015).


Among professional voice users such as teachers, singers, radio and TV announcers, voice-over artists, military personnel, call-centre workers, voice or sports coaches, the lifetime prevalence of voice disorders is even higher. Different studies show that teachers have almost twice the risk of voice problems in their life as non-teachers (Behlau et al. 2012; Roy et al. 2004; Garcia Martins et al. 2014b). Voice disorders in the speaking professions also adversely affect work performance, work attendance and work loyalty (Roy et al. 2004, 2005; Bhattacharyya 2014). One report has stated that 16% of teachers considered changing their occupation because of recurring voice problems (Behlau et al. 2012). Costs of lost productivity due to untreated voice disorders in occupations like teaching are estimated to almost $3 billion annually in the USA.


Other important risk factors for voice disorders include smoking history and alcohol consumption, while gastro-oesophageal reflux association is questionable (Roy et al. 2004).


4.7.2 Organic/Structural Dysphonia


In various adult populations, organic dysphonia is described as being less frequent than functional dysphonia (Garcia Martins et al. 2015; Preciado et al. 2005; Van Houtte et al. 2010). From childhood to adulthood, the most frequent organic causations include acute laryngitis, accounting for about 40% of cases (Cohen et al. 2012; Bhattacharyya 2014), with benign vocal cord lesions such as nodules, polyps or cysts (Garcia Martins et al. 2015; ASHA 2015; Kılıç et al. 2004; Preciado et al. 2005). Chronic laryngitis was detected in 3.47 per 1000 people in a primary care cohort, with a mean age of 52.9 years (Stein and Nordzij 2013). With a reported incidence in dysphonic patients of about 17% in the age group between 7 and 12 years of age, vocal cord nodules occur predominantly in boys, while in the age group from 12 years of age to adulthood, nodules are more frequent in the female population (Garcia Martins et al. 2015). The literature is seemingly clear that voice overuse is a significant risk factor in vocal cord nodule formation in youngsters and adults and also that mucous retention cysts may have a functional cofactor, but congenital epidermoid cysts are important as well (Altman 2007; Duff et al. 2004; Behlau et al. 2012; Villanueva-Reyes 2011; Johns 2003; Garcia Martins et al. 2015). Voice overuse is probably the reason nodules are the most frequent organic cause of dysphonia in teachers (Preciado et al. 2005; Garcia Martins et al. 2014b).


In the population over 40 years old, acid laryngitis and smoking-related changes, such as Reinke’s oedema, become more frequent (Garcia Martins et al. 2015). The incidence of dysphonia further increases in the group of patients over 60 years old (Garcia Martins et al. 2015). On one hand, various organic causes appear more frequently, partly due to the rising frequency of systemic diseases and decreased respiratory function, and on the other, presbyphonia occurs, as senescence profoundly affects the phonatory system through loss of elastic fibres, atrophy of the vocal folds, prominence of vocal apophysis and vocal tremor (Garcia Martins et al. 2014a, 2015; Marino and Johns 2014; Vaca et al. 2015) (see Sect. 5.​15). The prevalence of voice disorders in seniors is shown to be around 19–29%, often co-occurring with hearing loss and severely affecting quality of life (Roy et al. 2007; Cohen and Turley 2009).


Another organic cause of dysphonia is carcinoma, with dysphonia often being the first presenting symptom (Schultz 2011). Cancer prevalence in patients seeking treatment because of dysphonia is reported to be 2.2% and greatest among males over 70 years of age (Cohen et al. 2012).


4.7.3 Functional/Malregulative Dysphonia


Functional or malregulative dysphonia (Hacki 2014) describes all types of dysphonia that are not due to structural or pathologically defined ‘visible’ causes (see Sects. 4.1 and 4.2) (Izdebski 2019; Koufman and Blalock 1982; Garcia Martins et al. 2015; Preciado et al. 2005). The incidence in a treatment-seeking population is 30%, predominating in young adults, adult women and elderly patients (Izdebski 2019; Cohen et al. 2012; Van Houtte et al. 2010).


Functional dysphonia can be due to constitutional differences, speaking habits, misuse or abuse of the voice, changing voice in puberty or aspects of personality (see Sects. 4.1 and 4.2). Teachers and other professional voice users are at a much higher risk of developing functional dysphonia. In one study, hyperfunctional dysphonia was found in 17.4% of teachers compared with 7.2% of controls, and incomplete glottal closure was reported in 13.9% of teachers compared with 1.2% in controls (Sliwinska-Kowalska et al. 2006). Another study showed that 50% of patients with functional dysphonia were professional voice users (Deary and Miller 2011).


Functional dysphonia has been shown to be associated with psychosocial factors such as anxiety or depression in about 30% of cases (Deary and Miller 2011; Misono et al. 2014), but epidemiological data on the frequency of psychogenic voice disorders are rare, mainly because of difficulties in differential diagnosis, as these often include multifactorial conditions (see Sect. 5.​9). Psychogenic aphonia is rare (0.4%) but with a predominant occurrence in women (Kollbrunner et al. 2010).


4.8 Aetiology and Pathogenesis of Dysphonia: An Overview with Focus on Malformations, Inflammatory/Systemic Diseases, Malignancies and Traumata Affecting the Larynx, Resonance Disorders and Presbyphonia


Apr 26, 2020 | Posted by in OTOLARYNGOLOGY | Comments Off on of Voice Disorders

Full access? Get Clinical Tree

Get Clinical Tree app for offline access