We thank Fabian Bachl and Jakob Hochreiter for recording and editing the EMG-Videos. Parts of the presented work were supported by the German Federal Ministry of Education and Research (BMBF; IRESTRA grant 16SV7209), the Deutsche Forschungsgemeinschaft (DFG) grant DE 735/15-1 and GU-463/12-1. and the DEGUM (German Society for Medical Ultrasound).
Current Standards for the Classification of Motor Function Disorder and Synkinesis in Facial Nerve Palsy
The usual standard in clinical routine practice is the subjective evaluation of the severity of the motor disorder by the therapist (examiner-based) with a graduated scoring scheme, often with a scoring scale. There is no national or international standard and there is no optimal observer-dependent, and certainly no objective observer-independent, procedure. There are at least 19 classification schemes in use worldwide, most of which have been designed for the assessment of acute idiopathic facial palsy (Bell’s palsy). If one analyzes the current systems based on criteria for an ideal classification system, only the Sunnybrook Facial Grading System , and partly, the Facial Nerve Grading System 2.0 , meet these criteria. Experts worldwide continue to use the very unreliable House-Brackmann grading scale, which is still recommended in guidelines on acute facial nerve palsy. , This is certainly due to the degree of familiarity with the House-Brackmann scale. A particular advantage is that it allows for fast classification in clinical routine practice without further aids. However, the House-Brackmann grading scale does not classify synkinesis, which would be important for the patients in the post-paretic phase of the disease. The aspect of speed with simultaneous standardization is thus also significant for any emerging method, as this will be an important feature for its acceptance in clinical routine practice.
Due to observer dependence and therefore lack of objectivity, all evaluation schemes inevitably have limited intraobserver and interobserver reliability. Most evaluation schemes are designed to be used face-to-face but in reality, more or less well standardized photographic or video sequences are often rated post hoc. gives an impression of a photo session for documenting facial deficits in the photo lab of the Facial-Nerve-Center Jena, Germany. However, because of the lack of alternatives, these schemes are also used as primary outcome criteria in large multicenter clinical trials. , To minimize variability created by the photographer during video recordings, the following videos provide standardized instructions to guide the patient through the whole documentation of facial movements ( vimeo.com/203921699 ) ( Fig. 8.1 ). Due to the defects inherent in these subjective schemes, the choice of method can have a considerable influence on the study results. Recently, a new subjective method has been widely publicized, a clinician-graded Facial Function Scale (eFACE). Its attractiveness is less due to the items that are queried than to the possibility of electronic use with a smartphone or tablet. The answers are given on visual analogue scales and the results are presented in graphic form for the user when used electronically, thus generating a pseudo-objectivity. However, the correlation of the results of the eFACE with the Sunnybrook Facial Grading System is good.
Any visual assessment does not directly examine the facial muscles, but the resulting facial movement on the skin. Electromyography (EMG) is a measurable, partly objective, but also user-dependent examination method of the facial musculature. In clinical routine practice, EMG as either needle EMG and surface EMG is generally only used to roughly estimate the severity of the injury and prognosis. Theoretically, needle EMG can be used to assess the voluntary activity of the facial muscle as well as change in the course, , although this is not routinely established due to the effort and invasiveness involved. The synchronous derivation of many facial muscles with a multichannel surface or needle EMG is also possible, but so far too time-consuming for routine use. (See section on Electromyography for more details on how to perform and interpret a clinical needle EMG.)
The differences between the therapist’s perception versus the patient’s also needs to be considered: the therapist’s assessment of the severity of the disease may differ significantly from the self-assessment of the patients. Therefore, in addition to the above-mentioned instruments for therapists, two instruments have been established for self-evaluation by the patients with facial palsy: the Facial Clinimetric Evaluation (FaCE) scale and Facial Disability Index (FDI). Both instruments are so-called patient-related outcome measures (PROMs) and used for an integrative description of the quality of life of patients. These PROMs are also important for the quantification of the symptoms of chronic facial palsy with aberrant reinnervation and synkinesis because they not only ask questions about motor function, but also about psychological, social, and to some extent, communicative restrictions. Nevertheless, for the evaluation of patients with chronic facial palsy and synkinesis, FaCE and FDI provide only a few synkinesis-specific items. These deficits are targeted by the Synkinesis Assessment Questionnaire (SAQ). All these instruments are discussed in more detail in the next section.
Classification of Synkinesis in Facial Palsy with Proms
The use of self-assessment tools by patients by means of questionnaires gives an impression of the influence of the disease on their quality of life. There are disease-specific and non–disease-specific questionnaires available for this.
Non–disease-specific questionnaires such as the Short Form 36 (SF-36) questionnaire or the International Quality of Life Assessment (IQOLA) allow a general assessment of physical and mental health. With them, the disease-dependent quality of life of different diseases can be compared. However, they do not allow for an assessment of disease-specific stress factors. Therefore, there are no questions that specifically address the symptoms of facial palsy such as synkinesis.
In contrast, disease-specific questionnaires are tailored to the stress factors of the target group of patients with the specific disease. Whereas they do not allow a comparison of quality of life with other patient groups, they do allow for a more specific assessment of the underlying disease and disease-specific symptoms.
Perhaps the most popular facial palsy–specific PROM is the FaCE scale. The FaCE Scale, developed by Kahn et al., covers functional aspects of facial palsy and psychosocial stress factors alike. It was first described in 2001 and has since been translated into and validated in several languages. , It is regarded by experts as reliable and valid. The questionnaire contains 15 questions in six categories (facial movement, facial well-being, oral function, eye well-being, tear function, and social function) which are answered in the form of a five-point Likert scale . The total FaCE score is determined as the sum of the individual results. Between 0 and 100 points can be achieved, with a worse result if the number of points increases. Questions that directly address synkinetic facial movements do not exist. Only three questions indirectly target the presence of synkinesis on the face. These are questions 4 (“Parts of my face feel tense, exhausted and uncomfortable”), 6 (“When I try to move my face, I feel tension, pain and cramps”), and 13 (“My face feels tired or I feel tension, pain or cramps, when I try to move it”).
Five years before FaCE, the FDI was first published. The FDI contains a total of ten questions, five on physical function and five on social function, which are answered in a Likert scale. For physical function, 25% (worst result) to 100% (best result) are possible, whereas for social function, results between 0% (worst result) and 100% (best result) can be achieved. There are no questions that specifically target synkinesis in chronic facial paresis. This lack of synkinesis-specific items is targeted by the SAQ ( Table 8.1 ).
|Please answer the following questions regarding facial function, on a scale from 1 to 5 according to the following scale:|
|Sum of Scores 1 to 9/45 × 100|
The SAQ is a specific instrument for the self-assessment of synkinesis and was developed in Boston in 2007. It consists of nine questions that alone evaluate the synkinetic dysfunctions of facial palsy. A score of 0 indicates no facial synkinesis, whereas the maximum score of 100 indicates strongest and lasting facial synkinesis. The SAQ thereby enables synkinesis to be evaluated in the course of facial paralysis or after therapeutic measures. It is available in three languages.
PROMs are by definition very dependent on the self-perception of the patients. Therefore, they are not objective in the sense of an automatic measurement without human influence. But in contrast to expert-based rating systems like the Sunnybrook or the eFACE, they are only dependent on the patient. In addition, they can be easily integrated into a clinical setting and, with minimal cost and time, a detailed follow-up of patients with facial palsy is possible. However, there are often large discrepancies between the PROMs and expert-based ratings which is still not fully understood and is a topic of ongoing research. , In summary, before using PROMs, it must be decided which aspects of the disease are of interest as the SAQ only records functional/motor targets, whereas the FaCE and the FDI also map emotional stress factors. Another criterion of note is availability of the PROM in the patient’s mother tongue.
Automatic Image Analysis of Facial Expression in Facial Nerve Palsy
Currently, there are no automated procedures for clinical routine or standards for clinical trials. The main reasons for this are the individually chosen hardware and software solutions, the high acquisition costs, the huge amount of time spent creating the videos, the need to attach markers to the face before recording, the complex evaluation after the medical check-up allowing only semi-quantitative evaluation (e.g. the examiner has to place measuring points in the images), or simply the inadequate conversion of a laboratory workstation into a setting suitable for clinical routine examination. No automated system has been validated in a clinical setting on a large representative group of patients with facial nerve palsy.
The large number of approaches initiated by clinicians only illustrates the unsatisfactory solution to the problem. , , The first method to use modern computer vision methods was introduced in 2010 to automate the subjective investigator-based House-Brackmann grading scale. Anatomical landmarks defined around the eyes, mouth, and ears were automatically localized without markers in two-dimensional (2D) color images which were subsequently used for data enrichment and synthesis of “virtual” faces with facial nerve palsy. Based on these synthetic data and heuristically determined distance thresholds of corresponding landmarks of the two halves of the face, the House-Brackmann Index was estimated.
A hybrid approach was presented in a 2016 publication which is rule-based and uses learned classifiers as a control instance. It first distinguishes between volunteers and patients, then recognizes the type of facial paresis, and finally indicates the House-Brackmann grading. 2D grayscale images of five different facial poses are used and the facial landmarks are localized without markers and then distances between these facial landmarks are used as facial features. Both approaches only use single images of very few patients.
The use of heuristically determined distance thresholds is not representative enough for generalization, as 2D landmarks cannot be assumed to be normalized against affine transformations.
Even in a recently introduced smartphone-based diagnostic system, facial features are determined exclusively by distances and angles between 2D landmarks. However, image series in the form of videos of three movement exercises are used. The facial features are extracted from the single images and the House-Brackmann grading scale is learned by means of support-vector-machine (SVM). In the work of Peterson et al., action units (AUs) are used as facial features to evaluate patients with blepharospasm in videos. The computer expression recognition toolbox (CERT) was used to calculate the AUs. In both studies, time series of mimic facial movements in the form of image sequences are used for the automatic evaluation of mimic dysfunctions.
In order to get closer to the three-dimensional (3D) movement of the face, the first automatic 3D analysis systems for patients with facial nerve palsy were introduced. Facegram works with red-green-blue (RGB) cameras that combine conventional 2D color images with depth information (e.g., the Microsoft Kinect system). With Facegram , however, special markers must be applied to the face, which limited its adoption. Endpoints for trajectories were drawn on the face and the change in trajectories during movement was analyzed. , The Kinect v2 was used as a prototype to automatically calculate an asymmetry index for the healthy side in patients with facial paresis without landmarks, or integrated into a feedback system to give patients under exercise therapy feedback on successful completion of the task.
The mentioned works used all anatomical 3D landmarks of the face, which were projected from 2D into the 3D point cloud and use individual 3D point clouds. Often, 68 defined landmarks around the eyes, eyebrows, nose, mouth, and cheeks are used to describe expression in faces ( Fig. 8.9 ). However, landmarks can alternatively be defined based on facial curvatures which should be more representative, especially for 3D images, from a biological and clinical perspective.
Corresponding curves were extracted from two faces in a comparative manner in order to analyze an improvement in facial nerve palsy after treatment with botulinum toxin. , Sequences of 3D images were used as transformations to four-dimensional (4D) images. The 4D measurement enabled a static and a dynamic evaluation. Sixteen patients were asked to perform eight facial expressions before and after treatment. The recorded facial point clouds were mirrored and registered frame by frame and the resulting point cloud pairs were used to create dense scalar fields , which reflect and visualize the level of asymmetry.
Although the work uses 4D measurements and curvature describing features for asymmetry analysis, only the asymmetry is determined automatically in the form of visualization by two registered point clouds as an objective tool to measure dysfunction in patients with facial palsy.
For facial action coding system (FACS) analyses for emotion recognition, valid automatic video-based AU detection algorithms are now also used. In addition to the use in classic psychological experiments, automatic FACS analyses are now also used for neurological or psychiatric diseases. Using automatic FACS analyses, patients with chronic facial paresis were also examined whereby the AU analyses themselves were not used at all, but eye closure and smiles were analyzed on the basis of pixel shifts. Subsequently, we were able to show that it is possible to automatically detect AUs in standard FACS datasets with an active appearance model (AAM) approach with very high quality results. And finally, using this method we were able to classify the AUs on both the paralyzed and opposite side in photographs of 299 patients with acute facial paresis. In the following section, more technical details for automatic quantification of facial palsy are presented.
Automatic Facial Landmark Localization
Facial landmarks are descriptive points in the face which characterize the shape of a face and serve as an accurate identification of specific facial features. They are located at the eyes, eyebrows, nose, mouth, chin, and cheek. However, the number and position of landmarks is not defined. Therefore, in practice, there are numerous facial datasets with a different number of annotated landmarks available. A list of available datasets is shown in Table 8.2 .
|Menpo Benchmark||8979||Unknown||68 (profile: 39)||2017|
Fortunately, the available annotated datasets of Table 8.2 can be exploited for use in machine-learning algorithms for automatic localization in nonannotated images by an appropriate training process. This reduces the effort needed by experts if they were to perform the annotation task manually. In addition, the trained models can be applied to face images of facial palsy patients where only the landmarks of one facial hemisphere are used for model training as, by vertically flipping the images, all landmarks can still be used.
A first approach for automatic landmark localization is an AAM introduced by Cootes et al. An AAM combines the shape information and texture information of a face in a holistic generative manner. It models the visual appearance of the face in a global manner by considering typical interrelationships of the face shape and the face texture. The generative model can be applied to unseen new images to automatically localize a face shape (landmarks). Here, as a result, a vector of real numbers is obtained. These vectors can be used as features, for example, to train a model to automatically grade facial palsy. Additionally, the shape and texture information of both face hemispheres can be used separately to train single AAM models.
Another method to localize facial landmarks automatically is based on regression. In facial landmark regression, the single landmarks are located independently of all other landmarks. Compared to AAMs, it is an advantage if not all landmarks of the face are visible in the image. Convolutional neural networks (CNNs), a special variant of artificial neural networks , can be used as regressors. CNNs have a multilayered architecture and the capability to learn specific features of the input data by themselves in order to make better use of them for the regression task. The first layers use convolutional operations to learn a number of filters based on the appearance of the input data. The fully connected layer at the end of the CNN is used to learn the locations of landmarks in the form of a multioutput regression task. Fig. 8.2 illustrates an example of CNN architecture for facial landmark regression.
Complex Facial Features
For automatic grading of facial palsy, suitable and powerful facial features that describe the shape (and texture) of a face are necessary. In the following sections, we introduce facial features based on statistical models and features extracted by a deep learning model.
As described in the former section, AAMs provide powerful facial features characterizing the shape and the texture of important parts of the face in a compact form of a vector containing real numbers.
Facial AUs are defined in the FACS by Ekman and Friesen : The FACS attempts to deconstruct all facial expressions into single facial muscle group activations called AUs, which encode a facial expression. In some cases, the AU corresponds to a single muscle, but in other cases the action seen is the result of multiple muscles working together. Multiple activations of facial muscles result in emotions like happiness, sadness, surprise, fear, anger, disgust, or contempt. These basic emotions are coded in combinations of AUs, for example, AU6 + AU12 for happiness. Public annotated datasets (e.g. CK+) can be used to train a model for AU parameter prediction, and the AUs can later be used as facial features.
In the previous section, CNNs were introduced as an automatic facial landmark regression task. Whereas CNNs autonomously learn powerful image features in the initial layers in terms of input data, they can also be used as feature donators. After training a CNN with facial input images for an auxiliary task such as facial landmark localization, the weight vectors of certain layers can be extracted.
Automatic Facial Palsy Assessment
Facial landmarks describe the shape of a face that can be exploited to analyze asymmetries in the face shape by comparing corresponding landmarks of both face hemispheres. For a detailed asymmetry analysis, suitable landmark distances between two landmarks of each side are well qualified. In Fig. 8.3 some of those suitable distances are illustrated.