Recent advances in the field of artificial intelligence, especially the development of deep learning algorithms, have spurred a surge of interest in their application for the detection of ophthalmic diseases such as glaucoma. As the leading cause of irreversible blindness worldwide, early and reliable diagnosis of glaucoma is critical so that appropriate treatment can be initiated. Optical coherence tomography (OCT) is the most commonly acquired imaging test to assess for early structural changes in the optic nerve head and macula due to glaucoma, but application of artificial intelligence (AI) may be able to glean additional information beyond current automated segmentation software. Several groups have demonstrated that deep learning algorithms can be trained to accurately distinguish between eyes with glaucomatous changes and healthy controls on OCT. Attention class activation maps, or heat maps, may point to additional structural changes from glaucoma that are not captured by the standard summary parameters provided by current OCT software. Deep learning (DL) models have also been developed to predict quantitative values from OCT, such as the retinal nerve fiber layer or Bruch’s membrane opening–minimum rim width, from a color fundus photograph. Future studies will need to assess the performance of these different algorithms in real-world settings and explore whether novel algorithms can be built to detect glaucomatous progression on OCT.
Key wordsartificial intelligence – deep learning – machine learning classifiers – glaucoma – spectral domain optical coherence tomography
14 Future Directions: Artificial Intelligence Applications
Glaucoma is the leading cause of irreversible visual disability and blindness worldwide, with some studies estimating that nearly 112 million people will be impacted by the year 2040. 1
However, the majority of patients with glaucoma are not aware they have this disease, which may be due to poor public awareness about glaucoma 2 and the fact that glaucoma remains relatively asymptomatic until the advanced stages. The concern that patients with glaucoma may not present until late in the disease has spurred interest in the development of affordable public health interventions to screen for and diagnose glaucoma so that effective treatments can be prescribed before the onset of substantial vision loss. However, prior efforts to screen for glaucoma using perimetry and tonometry have shown poor sensitivity and specificity at a population level. 3 Low-cost nonmydriatic fundus photographs are easy to acquire but require laborious subjective review by human graders. Moreover, studies have shown that human grades of color fundus photos have poor reproducibility 4 and poor sensitivity 5 for glaucoma in screening settings. The accuracy of these qualitative grades can be further limited in the cases of large physiologic cups, small optic nerve heads, or myopic tilted discs which can be challenging to evaluate.
Spectral domain optical coherence tomography (SD-OCT) has become the de facto standard for objective quantification of structural changes in the retinal nerve fiber layer (RNFL) and macula due to glaucoma because of its excellent reproducibility and accuracy. 6 SD-OCT can detect early structural glaucomatous changes before the onset of detectable visual field loss on standard automated perimetry. Although SD-OCT has been widely adopted in clinical practice, its high cost and the reliance on skilled operators for image acquisition have limited its utility in population-based screening.
Nevertheless, with recent advances in, especially the development of DL algorithms, there has been rising interest in building algorithms capable of diagnosing glaucoma on SD-OCT imaging. DL algorithms may be able not only to discriminate between eyes with and without glaucoma, but also to learn novel features on SD-OCT imaging that improve its discriminatory ability. By assessing the entire SD-OCT B-scan, DL algorithms may also obviate reliance on parameters derived from automated segmentation which can be prone to segmentation errors and artifacts. One group has also demonstrated a novel approach in which SD-OCT data was used to develop DL algorithms that detected glaucomatous damage on color fundus photographs, thus improving the prospect of screening for glaucoma with low-cost photos. The purpose of this chapter is to provide a brief overview of AI with an emphasis on DL, and to then review select pivotal papers that have recently demonstrated how SD-OCT data can be used to develop DL algorithms capable of diagnosing glaucoma.
14.2 Artificial Intelligence
Artificial intelligence is a broad term that refers to the development of computer programs that can automate tasks in a way that mimics intelligent human behavior despite receiving minimal human input. 7 Broadly speaking, AI encompasses both machine learning classifiers (MLCs) and DL algorithms (Fig. 14‑1). MLCs are trained, rather than explicitly programmed, to find statistical patterns in datasets by being presented multiple relevant examples, and by this process they learn to automate the task. For traditional MLCs, the features need to have been already identified by humans using their domain knowledge. A distinct advantage of MLCs over traditional statistical programming is that MLCs can handle complex, large datasets that would not be practical to analyze using traditional statistical approaches. There are many types of MLCs including random forest (RF), logistic regression (LR), support vector machine (SVM), independent component analysis (ICA), and Gaussian mixture model (GMM). Although research has been conducted using MLCs to classify glaucoma data, these previous studies were limited in the types of data that could be processed and often relied on the input of parameters from automated segmentation. This is because MLCs could only process data in one or two dimensions, and were not well suited for complex image processing.
The development of sophisticated convolutional neural networks (CNNs) has shifted the focus in AI in glaucoma to application of DL methods. DL refers to a relatively novel advance in machine learning in which a CNN autonomously learns features and tasks from a training dataset. Data is input, processed, and weighed through successive layers of these neural networks, which consist of series of interconnected nodes, until the algorithm develops a system of classifications capable of making a prediction (Fig. 14‑2). CNNs are substantially more complex and refined than MLCs. DL algorithms do not require that the specific features of interest in an image be identified a priori by the human programmer. Also, unlike MLCs which can only process data in a small number of dimensions, DL algorithms can process very complex data in multiple dimensions, and are thus better suited to the processing of ophthalmic imaging.
14.2.1 Development of a Deep Learning Algorithm
Training of a CNN goes through three basic stages: training, validation, and testing. There are also three approaches to training a CNN: supervised, unsupervised, and semi-supervised learning. In supervised learning, the DL algorithm is trained using a dataset where every image has a label, such as glaucoma or healthy. Unsupervised learning, on the other hand, exposes the algorithm to unlabeled data and allows it to extract novel patterns from the data. In semi-supervised learning the algorithm first trains on a large unlabeled dataset and then a much smaller labeled dataset in order to improve its performance. Next the DL algorithm must be validated to determine how well the model fits the training dataset. Finally, the performance is evaluated by applying the DL algorithm to a separate test dataset. It is important that the images from the same patient or same eye do not exist in both the testing dataset and the training/validation datasets because this can lead to biased overestimates of the algorithm’s performance.
Due to their complexity, DL algorithms require much larger datasets to train in the order of millions of ophthalmic images, which is not practical to obtain. Thus, transfer learning techniques are increasingly applied to improve the performance of DL algorithms in ophthalmology. 8 Transfer learning makes use of a CNN that has already been trained to be a general image classifier using a very large general image dataset such as ResNet34 or InceptionV3. Additional training is then conduced with a much smaller dataset of ophthalmic images to improve its performance for a specific task, such as discriminating between glaucomatous and healthy eyes. Application of transfer learning has enhanced the feasibility of developing high-performing DL algorithms for detecting glaucoma on SD-OCT and other ophthalmic imaging.
14.2.2 Deep Learning Algorithms to Diagnose Glaucoma on SD-OCT
Several groups have developed novel DL algorithms that can distinguish eyes with glaucoma from those without glaucoma when applied to SD-OCT images. CNNs can be developed to make predictions directly from the OCT imaging or can be combined in hybrid approaches that harness the CNN to extract the relevant features from OCT imaging so that machine learning techniques can be applied to classify the images. Muhammad and colleagues, 9 for example, generated a hybrid deep learning model (HDLM) for diagnosing glaucoma from OCT. They obtained 9 × 12 mm swept-source OCT images in 102 eyes (57 with glaucoma and 45 no glaucoma according to expert diagnosis using clinical information and imaging); and from these OCT scans they generated six images to be input to the CNN: (1) three-color channel thickness map for the retinal ganglion cells (RGCs), (2) three-color channel thickness map for the RNFL, (3) RNFL probability map, (4) RGC probability map, (5) en face projection image, and (6) a combination image where the red channel was replaced with the RNFL probability values, the green channel with the RGC probability values, and the blue channel with normalized RNFL thickness values. Using AlexNet, a two-dimensional CNN pretrained on ImageNet, they performed feature extraction with the Caffe DL framework using each of the six aforementioned images as input to the CNN. The extracted features were then used as input to a RF model for classification of the images as glaucomatous or healthy, and the predictions were compared against traditional visual field and OCT metrics. In this study, the HDLM based on the RNFL probability map had the highest accuracy and lowest variability (93.1% +/– 0.57%), and was significantly better than conventional clinical metrics such as OCT quadrant analysis (accuracy 83.7%) or 24–2 Humphrey visual field mean deviation (80.4%) (all p < 0.001). However, other features did not perform well; for example, the en face projection showed the poorest accuracy and greatest variability (65.7% +/– 1.53%). One limitation of this study was the very small size of the training set. Transfer learning was performed using AlexNet but it is possible the accuracy of the algorithm would be higher if trained only on OCT images, especially given the small size of the OCT training set. Moreover, one can argue that outperforming isolated clinical metrics like OCT or perimetry parameters is not that difficult.
Asaoka et al 10 conducted a much larger, multi-institutional study to determine if a DL model would perform better than traditional MLCs for distinguishing early glaucoma from normal eyes on OCT macular maps of the RNFL and ganglion cell layer complex. Pretraining was performed using a large OCT (RS 3000 OCT, Nidek Co., Gamagori, Japan) dataset from the Japanese Archives of Multicentral Images of Glaucomatous Optical Coherence Tomography database, and consisted of 4,073 OCT images from 1,371 eyes in 747 subjects with open angle glaucoma and 243 images from 193 healthy eyes in 113 subjects. Then additional training was conducted on a smaller separate dataset of images acquired with the Topcon OCT-1000 or OCT-2000 machine (Topcon Corporation, Tokyo, Japan) in 94 eyes/subjects with early open angle glaucoma (OAG) and mean deviation (MD) >−5 dB and 84 normal eyes/subjects. The 8 × 8 grid macular RNFL and ganglion cell complex layer thickness from SD-OCT was input to the DL model. Testing of the DL algorithm was performed on a separate dataset of 114 eyes/subjects with OAG and MD >−5 dB, and 82 normal eyes/subjects. The diagnostic accuracy of the DL model was significantly greater than that of two traditional MLCs, namely, RF and SVM (area under the receiver operating characteristic [AUC] curve = 93.7% vs. 82.0% or 67.4%, respectively, p < 0.001). However, a notable limitation of this model was the reliance on macular map data without any input of circumpapillary RNFL data, which is more commonly acquired in clinical practice and which may have been able to improve the model’s performance.
Another drawback common to the two approaches just described is their reliance on OCT parameters and maps derived from the OCT machine’s automated segmentation software. Thus, the accuracy of these DL algorithms relied on the accurate segmentation of the acquired OCT images. However, OCT segmentation can be prone to artifact and segmentation errors, especially in older patients or cases of advanced glaucoma. Studies have suggested that segmentation errors may be present on 20 to 40% of OCT scans, which can undermine the accuracy of segmented parameters. Moreover, in the aforementioned studies, the selected input was restricted to those structural features on OCT that are already known to be affected by glaucoma, such as the RNFL and ganglion cell complex, thus excluding the possibility of learning from other structural features on OCT. Application of DL algorithms directly to raw, unsegmented B-scans from the OCT can allow the DL algorithm to learn from the entire B-scan image, rather than preselected features or parameters, and thus may prove more accurate.
In a recently published work, Maetschke et al 11 developed a “feature agnostic” 3D CNN that could distinguish between healthy and glaucomatous eyes from raw, unsegmented OCT volumes of the optic nerve head. Their study consisted of 263 scans in 137 healthy patients and 847 scans in 432 primary open-angle glaucoma (POAG) patients. They compared this novel approach to the more traditional feature-based approach where various MLCs were trained using 22 segmented measurements from the Cirrus OCT (i.e., peripapillary RNFL thickness at 12 clock-hours, peripapillary RNFL thickness in four quadrants, average RNFL thickness, rim area, disc area, average cup-to-disc ratio, vertical cup-to-disc ratio, and cup volume). Logistic regression had the highest AUC among MLCs in the test dataset (0.89 ± 0.028) but this was significantly lower than the performance of the feature agnostic DL algorithm which achieved an AUC of 0.94 +/– 0.036 (p < 0.05). Also of note was the fact that the class activation maps of the volume scans highlighted additional regions important in the algorithm’s classification, such as the lamina cribrosa, which may be a useful biomarker for glaucoma. Thus, this paper demonstrated that a DL model trained on a raw OCT volume scan outperformed MLCs that were trained using features derived from automated segmentation.
Another advantage of a DL model trained on unsegmented OCT B-scans is that it can produce a single probabilistic output for a diagnosis of glaucoma, which can be preferable to simultaneous interpretation of individual parameters. When faced with a high number of summary parameters, clinicians can find it difficult to determine which parameters are most important especially if they do not correspond with each other. Moreover, the more the number of parameters available the greater the likelihood of committing a type I error, or finding a false-positive test. This point was recently highlighted in a paper by Thompson et al. 12 In this study, a DL algorithm was trained to discriminate between glaucomatous and healthy eyes using the raw unsegmented peripapillary SD-OCT B-scan. The dataset consisted of 20,806 RNFL circle B-scans in 1,154 eyes of 635 participants. The ResNet34 architecture, which had been previously trained on ImageNet, was used for additional training of the DL algorithm. When analyzing the segmentation-free circle B-scan, the DL algorithm had a significantly greater AUC than conventional SD-OCT parameters like global RNFL thickness (0.96 vs. 0.87) or each of the RNFL sector values for discriminating between glaucomatous and control eyes (all p < 0.001). The sensitivity at both 80 and 95% specificity was also substantially greater for the DL algorithm than RNFL. Finally, when stratifying on glaucoma severity using the Hodapp-Parrish-Anderson criteria, the AUC was larger for the DL algorithm than for global RNFL at each level of severity, especially among those with preperimetric or mild glaucoma (p < 0.001). The application of this segmentation-free DL algorithm in clinical practice may improve the accuracy and sensitivity of SD-OCT for glaucoma diagnosis relative to the use of conventional RNFL from automated segmentation. Also, by providing a single probabilistic output rather than multiple SD-OCT summary parameters, use of the DL algorithm could decrease the risk of “red disease” or erroneously detecting glaucoma due to a false-positive test. Another advantage is that use of this algorithm obviates the reliance on potentially error-prone segmentation. Finally, class activation maps in this study highlighted areas on the SD-OCT B-scan beyond the RNFL, suggesting that other parts of the retinal architecture may also be important in detecting glaucoma.