CHAPTER 113 Vocal and Speech Rehabilitation Following Laryngectomy
History of Alaryngeal Voice and Speech
Historically the start of postlaryngectomy voice rehabilitation was surgical and prosthetic. The patient who underwent the first total laryngectomy (TL) for cancer, by Billroth in 1873, used an artificial larynx in order to speak. As eloquently described by Billroth’s colleague Gussenbauer in 1874, the patient could be understood from one side of the ward to the other.1 Considering that patient wards in those days were huge, holding up to 40 beds, this must have been an impressive achievement. This first artificial larynx, essentially a tracheotomy tube with a pharyngeal extension (Fig. 113-1), allowed pulmonary-driven speech. It contained a valve mechanism to prevent aspiration, a special membrane to act as a heat and moisture exchanger, and a reed to act as a vibrating tone generator. The only futile component of the initial artificial larynx was the tone generator in the form of a reed. Apparently, in preparing this device before the operation, it was not anticipated that diverting pulmonary air into the pharynx could produce mucosal vibrations and thus sound, which actually could interfere with the monotonous tone of the reed, potentially leading to diplophonia. The main reasons for abandoning artificial larynges in those early decades were the many wound-healing complications with these bulky devices in an era without proper anesthesia and antibiotics.2 The discovery that with air insufflation the pharynx can act as a tone generator has undoubtedly helped to stay away from surgical voice restoration options for a long period of time.3
In the late 1970s and early 1980s, surgical procedures (e.g., Asai’s, Staffieri’s, and Amatsu’s techniques) to create a functional tracheoesophageal fistula and thus allow tracheoesophageal voicing were initially popular but have since declined.4–6 After the first enthusiastic reports about the good voice that could be achieved, many head and neck specialists became disappointed about troublesome aspiration problems. In essence, one could say that if the patient had a good voice, he or she would aspirate, and when there was no aspiration, the patient had no or only a strained voice. Although in many countries these techniques were more or less completely abandoned, in others they led to a quest for solving the aspiration problem.7–12 The solution was found in the development of presently available voice prostheses, which have in common a surgically created tracheoesophageal fistula and the containment of a one-way valve mechanism.13 These prostheses keep the tracheoesophageal connection open, allow the passage of pulmonary air into the esophagus in order to produce pharyngeal vibrations and sound, and prevent aspiration by automatic closure of the valve when the pulmonary air pressure drops. These devices have proven to be both functional and reliable.14
Some historical reflections are important at this point. Although Singer and Blom have been credited with initiating modern voice prostheses development, the first paper on a useful prosthetic device (Fig. 113-2) actually originated in Poland, where Mozolewski (1917-2007) in 1972 published his results with 24 patients.15 This publication in Polish with abstracts in Russian and English probably went unnoticed. However, Mozolewski and colleagues16 remained active and in 1975 published a report in Laryngoscope about a postlaryngectomy surgical arytenoid vocal shunt procedure as an alternative to the popular Asai procedure. Subsequently, they presented both their surgical and prosthetic methods at a conference in Boston in May 1978. After these initial reports, however, Blom and Singer have put prosthetic voice rehabilitation on the world map and diligently and tenaciously instructed and educated numerous clinicians. Since the introduction of the first Blom-Singer prosthesis in 1980, many other brands have been introduced and further developed: Panje, Groningen, Herrmann, Traissac, Algaba, Provox, Nijdam, and Voicemaster, many of which are still available.7–12,17–19
Physiology of Alaryngeal Voice and Speech
For a good understanding of voice and speech rehabilitation after TL, the basic principles of normal laryngeal voice and speech production should be kept in mind. In order to be able to speak, it is necessary to have airflow, a sound source, and a cavity in which the sound will be transformed into intelligible speech. Normally, the airflow is provided by pulmonary exhalation, which through the well-known Bernoulli effect causes vibrations of the mucosa in the voice box and thus a sound source. The sound thus produced will be transformed into intelligible speech in the mouth, nose, and throat cavities by articulation (e.g., by movements of the muscles present in that area, together forming the vocal tract, schematically shown in Fig. 113-3).
After TL the vocal tract is only slightly changed because the removal of the hyoid bone and the larynx alters the position of the tongue base, possibly influencing speech intelligibility. The most obvious change, however, is that the voice box as the natural sound source is replaced by the pharyngoesophageal (PE) segment, which has a significant impact on voice quality.20 And without any additional measures, the disconnection of the airway and the PE segment makes the lungs unavailable as an air source.
Esophageal Voice and Speech
A potential source of air can be air injected into the esophagus and/or stomach. By expelling this air, it is possible to set the mucosa in the PE segment into vibration, which acts as a sound source. With the sound thus produced, intelligible speech can be formed in the intact vocal tract. This so-called esophageal speech is schematically depicted in Figure 113-4. A drawback of this method is that relatively little air is available (≈80 mL), in contrast to the liters of air in the lungs available before the operation. Therefore the phonation time is short, about 1 to 2 seconds compared with more than 20 seconds in laryngeal voicing. Additionally, for many patients this technique is difficult to acquire and rehabilitation often takes months. Success rates in the literature vary substantially, partly because of the lack of a good definition of voice quality. No more than 40% to 60% of the patients acquire reasonable speech and only 10% develop a really good voice.21
Electrolarynx Voice and Speech
Another method is the use of a tone generator, which consists mostly of an electrically driven instrument called an electrolarynx. These devices generate vibrations that pass through the skin toward the throat (schematically depicted in Fig. 113-5). Sound generated in this way is transformed in the vocal tract into intelligible speech. The advantage is that most patients acquire this speech rapidly, but the drawbacks are a monotonous, robot-like sound and the necessity to always have one hand occupied while using it.
Tracheoesophageal Voice and Speech
The limitations of esophageal and electrolarynx speech outlined previously have initiated the development of the so-called voice prosthesis, essentially a one-way valve allowing air to pass from the trachea into the PE segment and preventing aspiration. Through a minor surgical procedure, either primarily at the time of the laryngectomy or secondarily at a later date, a tracheoesophageal fistula is created, allowing implantation of this device. After closure of the stoma, just like before the operation, air from the lungs creates vibration, in this case in the mucosa in the throat (schematically depicted in Fig. 113-6).22–24 Even after extensive resections and reconstructions of the throat, this method is applicable, whereas esophageal speech in these cases is hardly ever successful.25 The recovery of oral communication is so rapid (in most cases a useful voice develops within 2 weeks) and the success rate of the prosthetic method is so high (in the magnitude of 90%) that this rehabilitation technique has developed into the “gold standard” and recovery of speech after TL has become quite predictable.26,27
Tracheoesophageal speech, like normal laryngeal speech, is pulmonary driven and thus is closest to normal. The availability of the pulmonary air allows maximum phonation times (MPT) that in many patients are approaching normal values; mean MPTs of 16 to 17 seconds are no longer the exception.28 Even in most COPD patients, pulmonary capacity is still sufficient to allow for good voicing.29 As already mentioned, the sound is produced in the PE segment by mucosal vibrations generated by the pulmonary air.22,24 An important issue to stress at this point is that there is no gender difference in postlaryngectomy anatomy and physiology of the new sound source.23 This means that male and female voices do not differ in fundamental frequency, which is approximately 100 Hz. Thus female voices often sound too low, whereas male voices are fine.24,30 This is one of the remaining issues to address in the future.31 Although it may sound like a superfluous remark, one has to keep in mind that the pharynx also constitutes the alimentary tract, so after laryngectomy the PE segment has a dual function: a sound generator and a passageway for food. This should never be forgotten when problems occur with voice prostheses and when solutions are searched for: What could be beneficial to one function might be detrimental to the other one.
Types of Voice Prostheses
Indwelling and Nonindwelling Prostheses
In general, two types of TE voice prostheses can be distinguished: nonindwelling and indwelling devices. The first voice prostheses that became available in the United States were of the nonindwelling type (Blom-Singer duckbill and Panje prostheses),11,12 whereas the first European prostheses were indwelling devices (Mozolewski and Groningen)10,15 (Fig. 113-7).
Figure 113-7. A, The original Blom-Singer duckbill prosthesis (1979/80) without a retention flange. B, The Panje prosthesis (1981), which had a retention flange, and a safety strap from the start. C, The later low-resistance version with a retention flange (1982). D, The indwelling Groningen prosthesis, which had a sturdy retention flange–valve combination, ensuring proper retention, even when inserted at the time of tracheoesophageal puncture (primarily or secondarily). For the Mozolewski prosthesis, see Figure 113-2.
The nonindwelling TE voice prostheses can be removed and replaced by the patient. The indwelling TE voice prostheses stay in place permanently and must be removed and replaced by a clinician experienced in the method. The need for prosthesis replacement occurs at the end of the device’s life span, which is usually determined by leakage of fluids through the prosthesis into the airway or an increased airflow resistance of the prosthesis. Because indwelling devices may have a more robust construction, their life spans are generally longer than those of their nonindwelling counterparts. Furthermore, indwelling devices have the distinct advantage in that the patient’s dexterity plays a lesser role in the daily maintenance of the device; this mainly consists of internal cleaning with a brush and/or a flushing device without the need of regularly replacing the prosthesis. Even with increasing age or decreasing general health status, or both, a useful (prosthetic) voice can be preserved.14,32 The obvious disadvantages of indwelling prostheses are that patients need a clinician for the replacement and hospital or clinic visits remain necessary. However, nonindwelling devices can also cause problems, forcing the patients to consult their clinicians. Thus the difference in the need for clinical aftercare is far from “all or nothing.” Furthermore, as with indwelling devices, a regular checkup of the fistula/stoma region is necessary because of the need for early detection of possible adverse side effects (e.g., hypertrophy, infection, widening of the TE fistula) and for oncologic follow-up, irrespective of the prosthesis used. Therefore from the beginning of the modern prosthetic rehabilitation era in most of Europe, there was a strong preference for indwelling devices.9,10,32–34 Today’s most widely used indwelling prostheses in Europe, the Provox® and Provox® 2 system, are shown in Figures 113-8 and 113-9, respectively.
The evolution of voice prosthetics in the United States has been somewhat different from that in most of Europe. The American development has been influenced by speech-language pathologists (SLPs), who from the beginning favored delaying the insertion of the prosthesis. Instead, the preference has been to create a tracheoesophageal fistula and stent it with a rubber (feeding) catheter either primarily or secondarily. The insertion of the voice prosthesis is done only after wound healing is completed. The disadvantage of this approach is the tendency for the fistula to have associated edema and irritation and not to have reached its final length when the prosthesis is fitted 10 to 12 days later. In contrast, when the insertion of the prosthesis is done at the time of laryngectomy, as was originally described by Annyas and colleagues,35 a second fitting can be avoided and the estimate of the length of the prosthesis is more accurate (almost always 8 mm). Nonetheless, others prefer inserting a feeding tube through the fistula at the time of the laryngectomy, which has the advantage of avoiding an indwelling feeding tube attached to the nose. Usually, the patient is ready to take oral feeds when the tube is removed 10 to 12 days later for placement of the prosthesis. For patients who were treated previously with radiation therapy, this length of time for tube feeding is preferred. Although insertion of the prosthesis during the postoperative phase may be difficult for some patients, the majority of them tolerate this well. However, the alternate approach of inserting the prosthesis immediately at the time of the laryngectomy as described earlier has been reported to be quite satisfactory with a low complication rate.9,32,35,38–41 Additionally, consideration should be given to the psychological advantage for patients to start voicing immediately after removal of the feeding tube. The recent trend by some centers to start oral intake soon after the laryngectomy in nonirradiated patients, within 24 to 48 hours, coordinates well with the early insertion philosophy.42 Therefore it is not surprising that the immediate insertion method is gaining popularity in the United States, with SLPs still being the main responsible clinicians for the rehabilitation of postlaryngectomy voice and speech rehabilitation.
Surgical Aspects of Alaryngeal Voicing
Problem solving and prevention starts at the time of TL.
Standard Total Laryngectomy
For the indications and surgical aspects of TL, see Chapter 111. In the following section, specific aspects of surgical prosthetic voice rehabilitation are discussed.
Surgical Techniques and Refinements
Primary prosthetic voice restoration, meaning tracheoesophageal puncture (TEP) and immediate insertion of a voice prosthesis at the time of TL, is the method of choice for many experienced medical professionals (see Fig. 113-12, B; video clip available on website).13,36 This enables the easiest and most comfortable voice rehabilitation because the patient is still under general anesthesia when the first prosthesis is inserted. No stenting of the fistula with a nasogastric feeding tube is necessary because the device itself is used to stabilize the puncture site.37 The TEP can almost always can be done as a primary procedure, even when the circumference of the neopharynx has to be reconstructed, provided the esophagus is still present at the level of the trachea.25 Only when the proximal esophagus is dissected off the trachea, as in the gastric pull-up procedure, is there a need to delay the TEP for 4 to 5 weeks.25 In those cases, secondary TEP is performed after the completion of wound healing and before possible postoperative radiotherapy (see also discussion of the role of radiotherapy in the Prosthetic Voice Rehabilitation and Radiotherapy section).
With the currently available devices, voice prostheses (if available and affordable)43–46 can be applied, irrespective of the method of closure of the pharyngeal mucosa and the extent of the pharyngeal constrictor muscle defect. However, with a few refinements in the surgery of TL, several postlaryngectomy problems such as hypertonicity of the pharyngoesophageal (PE) segment and a poor contour of the stoma can be avoided or diminished.
Refinements of standard TL techniques for optimizing prosthetic voice restoration results are:
Hypertonicity of the PE segment is the most frequent reason for failure to develop fluent prosthetic speech, as well as esophageal speech.47,48 The cause is excess tone in the constrictor pharyngeus muscles forming the wall of the PE segment. In patients with this problem, this tonicity is exacerbated after inflation of air and thus blocks the flow of air through the PE segment, preventing mucosa vibrations and thus sound production. Although there are several surgical solutions described in the literature, we believe the best option to prevent this problem is to perform a short anterior positioned myotomy of the circular proximal upper esophageal sphincter (cricopharyngeus) muscle in every patient (Fig. 113-10; video clip available on website), unless palpation during surgery reveals that this muscle is completely relaxed.49 After this myotomy and prosthesis insertion, the surgeon can still close the pharynx (mucosa and constrictor muscles) in the preferred manner.
Figure 113-10. Myotomy of the cricopharyngeal muscle to prevent hypertonicity of the pharyngoesophageal segment (video clip available on website).
Ideally, the patient has a stable stoma with the same diameter as the trachea or only slightly narrower in order to have easy access to the voice prosthesis and to avoid a cannula to stent the stoma for adequate respiration. Reasons for stoma stenosis can be a dehiscence of the trachea from the skin because of infection and traction, sectioning of the last tracheal cartilage “ring” and diminishing its internal stability, or contraction of the tracheocutaneous suture. A reliable technique is to suture the trachea into a separate fenestra in the lower skin flap (Fig. 113-11).50 This fenestra should have approximately the same size and diameter as the trachea, and the cranial trachea ring should remain intact. This latter aspect is the most important point of the creation of a stable stoma: the collagen fibers in the trachea cartilage are distributed in such a manner (dense and parallel on the outside and loose and intermingled on the inside) that the cartilage acts as a “spring” trying to widen the trachea.51 If the cartilage is cut, these spring forces are lost and the trachea will collapse, resulting in a smaller stoma. Meticulous suturing of the skin to the trachea mucosa and optimally covering the bare trachea cartilage is important to prevent local infection and fibrosis. The small strip of skin (8 to 10 mm) cranially is surprisingly vital and only rarely breaks down, even in irradiated patients. The advantage of this technique is that trifurcations in the wound are avoided, which is not the case if the stoma is created in the TL skin incision itself. Obviously, in some cases in which the trachea has to be sectioned lower down, it is not long enough to be sutured in a separate fenestra and the entire inferior skin flap is necessary to be able to form a stoma. But even in these cases, with an intact last tracheal ring it should be possible to decannulate the patient following the procedure. The absence of a cannula results in less irritation of the trachea and stoma (i.e., less coughing and strain, improved wound healing).
A deep stoma is problematic for the later application of additional rehabilitation devices that potentially rely on peristomal attachment such as a heat and moisture exchanger (HME) or an automatic speaking valve (ASV). Sometimes a deep stoma is unavoidable (e.g., when more than 3 to 4 trachea rings are removed). However, most “deep” stomas are caused by the protrusion of the sternal heads of the sternocleidomastoid muscles (SCMMs) (Fig. 113-12; video clip available on website). Therefore the solution is to cut the sternal attachment of the SCMMs during the operation and before skin closure. This generally results in a flat peristomal area, making attachment of external devices considerably easier. Functional side effects from resected SCMMs have not been observed.
Meticulous closure of the pharyngeal mucosa is important to avoid unnecessary wound healing problems. Importantly, any undue tension on the suture line should be avoided. This means that purely vertical or horizontal closure of the pharyngeal defect seldom is possible. To avoid undue tension, the closure should be T shaped, with a variable length of the horizontal and vertical bar of the T (Fig. 113-13). If the trifurcation is reinforced with an extra submucosal stitch, there is no increased fistula formation. Most experts agree that this closure also avoids pseudo-vallecula formation, which seems to occur more frequently when a pure vertical closure is used as a standard.
Postoperative Management
Alimentary Care
Postoperative management initially involves nutritional intake through a feeding tube and stoma care. Early removal of the feeding tube, even as soon as the second postoperative day, and the commencement of a liquid oral diet seems feasible without an increased fistula rate.42 After 8 days, mostly a soft, solid, normal diet can be resumed.
Pulmonary Care
Stomal care consists of the immediate postoperative application of an HME attached by a peristomal hydrocolloid adhesive and housing (Fig. 113-14). The early use of an HME avoids noisy external humidifiers while the patient easily adjusts to the breathing resistance of the HME, which is somewhat lower than the normal upper respiratory tract resistance.
Secondary Prosthetic Voice Rehabilitation
In relation to prevention of problems with secondary tracheoesophageal puncture (TEP) and immediate prosthesis insertion (also in this case the method of choice; two video clips of standard and alternative methods are available on the website), there are several issues worth mentioning. If the patient is a failed esophageal speaker, there is a chance that the cause for this failure is hypertonicity of the PE segment.48 This means that the medical professional and the patient have to anticipate further treatment before optimal fluent voicing can be achieved (see later for secondary hypertonicity treatment). With respect to the endoscopic technique when using a standard rigid esophagoscope, it is important to ensure the puncture is made high in the trachea (approximately 5 to 10 mm from the mucocutaneous border). This position makes the daily maintenance of the device by the patient (cleaning with a brush, or flushing, or both) and replacement by the medical professional in the outpatient clinic easiest. Further, the puncture should be done with a sharp trocar, preferably not with a scalpel incision or too wide of a fistula tract may result. As in primary TEP, the length of the prosthesis still can be mostly 8 mm, although slightly more often a longer prosthesis has to be used; the thickness of the party wall can be judged well through palpation through the stoma onto the esophagoscope. However, special measuring devices or prepuncture ultrasound can also be used. Because the prosthesis can be inserted immediately, patients can resume an oral diet almost immediately after the procedure and can begin voice and speech rehabilitation the same day. Finally, in order to prevent local infection, as in all “clean-contaminated” head and neck surgery, broad-spectrum antibiotic prophylaxis should be applied during TEP.
Prosthetic Voice Rehabilitation and Radiotherapy
Many surgeons are reluctant to perform a TEP in patients needing a TL for a recurrence after chemoradiotherapy or in patients with advanced disease needing postoperative radiotherapy. In cases of radiation failure, the potential for an increased risk of complications, especially fistula development, may bias one to avoid creating a fistula. Nevertheless, there is enough evidence in the literature that this controlled fistula formation does not increase the incidence of wound complications and that, as such, TEP with immediate insertion of a voice prosthesis is a safe and reliable method.14,32,39 The only warning is that after radiotherapy, one should wait 6 weeks to let the radiation side effects heal before performing a TEP (Box 113-1).
Prosthesis Replacement and Maintenance
Indications
Silicon rubber voice prostheses, irrespective of the brand applied, are semipermanent devices that have a limited life span. The life span of prostheses shows considerable variation; in the literature mean device life spans of 4 to 5 months are found in the Western world, but much longer device life spans (10 to 18 months) are described in the Mediterranean and United States.14,32,41,52–59 However, it is more important to look at the median device life span, which is a better reflection of the clinical practice, because the mean is often strongly influenced by a limited number of patients with long-lasting prostheses, sometimes up to 11.5 years.14