Purpose: To develop an artificial neural network model incorporating both spatial and ordinal approaches to predict glaucomatous visual field (VF) progression.

Design: Cohort study.

Methods From a cohort of primary open-angle glaucoma patients, 9212 eyes of 6047 patients who underwent regular reliable VF examinations for >4 years were included. We constructed all possible spatial–ordinal tensors by stacking 3 consecutive VF tests (VF-blocks) with at least 3 years of follow-up. Trend-based, event-based, and combined criteria were defined to determine the progression. VF-blocks were considered “progressed” if progression occurred within 3 years; the progression was further confirmed after 3 years. We constructed 6 convolutional neural network (NN) models and 2 linear models: regression on global indices and pointwise linear regression (PLR). We compared area under the receiver operating characteristic curve (AUROC) of each model for the prediction of glaucomatous VF progression.

Results: Among 43,260 VF-blocks, 4406 (10.2%), 4376 (10.1%), and 2394 (5.5%) VF-blocks were classified as progression-based on trend-based and event-based and combined criteria. For all 3 criteria, the progression group was significantly older and had worse initial MD and VF index (VFI) than the nonprogression group ( *P* < .001 for all). The best-performing NN model had an AUROC of 0.864 with a sensitivity of 0.42 at a specificity of 0.95. In contrast, an AUROC of 0.611 was estimated from a sensitivity of 0.28 at a specificity of 0.84 for the PLR.

Conclusions: The NN models incorporating spatial–ordinal characteristics demonstrated significantly better performance than the linear models in the prediction of glaucomatous VF progression.

G laucoma, a significant visual disability , affected 64 million people aged 40 to 80 years in 2013, of which 6.9 million were estimated to have moderate to severe vision impairment or blindness. ^{,} Characteristic glaucomatous optic neuropathy resulting from the loss of retinal ganglion cells is progressive, wherein patients usually remain asymptomatic until the later stages of the disease. Visual disability resulting from glaucoma is thought to be irreversible. Therefore, early detection of glaucoma and the prevention of disease progression is paramount in the management of glaucoma. Glaucomatous functional progression is mostly determined based on the visual field (VF) test, which requires time and effort for both the patient and the examiner. Hence, the prediction of glaucoma progression using only a few serial VF tests, sometimes even before the actual progression happens, would be clearly helpful for the clinician to decide the treatment timing and modality.

Traditionally, linear models have been used to predict glaucomatous VF progression, of which the 2 most notable methods are based on either global indices, such as mean deviation (MD) and VF index (VFI), or a pointwise linear approach. However, the clinical effectiveness of these linear models remains limited. Compared to standard Humphrey perimetry printouts, interobserver agreement in determining glaucomatous VF progression did not improve with the use of guided progression analysis software (GPA), which includes linear regression on MD or VFI. In contrast, pointwise linear regression analysis (PLR) has been reported to improve interobserver agreement. ^{,} A study comparing various strategies of pointwise models, including ordinary least squares linear regression, quadratic regression, exponential regression, logistic regression, and M-estimator robust regression, concluded that although M-estimator robust regression is more accurate than other models, approximately 10 VF tests are needed to achieve an accurate prediction of glaucomatous VF progression.

One alternative approach to the traditional models is the application of machine learning methods. Recently, studies utilizing machine learning have reported promising results in the diagnosis of glaucoma using functional and structural parameters, including anterior and posterior segment optical coherence tomography (OCT), confocal scanning laser ophthalmoscopy, fundus photography, and scanning laser polarimetry, with the area under the receiver operating characteristic curve (AUROC) ranging from 0.76 to as high as 0.99. A convolutional neural network (CNN) has been successful in predicting VF from the OCT images. ^{,} Wen and associates developed models to forecast VF defects from 1 to 5.5 years into the future and compared them to 3 linear models, demonstrating the superiority of deep learning models.

Inspired by the impressive result from the application of the 2-dimensional (2D) CNNs for the prediction of VF defects, we applied an ordinal extension to a CNN to incorporate more chronological information that would be unique for each eye. To accomplish the goal, we built a spatial–ordinal tensor by stacking 3 consecutive VF tests and applied a 3D convolution to predict glaucomatous VF progression over 3 years.

## METHODS

## PARTICIPANTS

The participants were selected from an ongoing Asan Glaucoma Progression Study. All participants initially underwent a complete ophthalmic examination, including a review of medical history, measurement of best-corrected visual acuity, manifest refraction, slit lamp biomicroscopy, Goldmann applanation tonometry, gonioscopy, funduscopic examination, stereoscopic optic disc photography, retinal nerve fiber layer photography, and a VF test (Humphrey field analyzer; Swedish Interactive Threshold Algorithm 24-2; Carl Zeiss Meditec). All participants were diagnosed with primary open-angle glaucoma, which was defined as follows: normal anterior chamber with open angle on gonioscopic examination; glaucomatous optic nerve head with diffuse or focal neural rim thinning; a difference in the vertical cup-to-disk ratio >0.2 between both eyes; and retinal nerve fiber layer defects of disc hemorrhage accompanied by a glaucomatous VF defect in the corresponding area. The following glaucomatous VF defects were defined: (1) a cluster of 3 points with a probability <5%, of which at least 1 point is required to have a probability <1% on a pattern deviation map; (2) a glaucoma hemifield test result outside 99% of the age-specific normal limit; and (3) a pattern standard deviation (PSD) outside 95% of the normal limit confirmed by at least 2 reliable consecutive VF tests (false positive error <15%; a false negative error <15%; and a fixation loss <20%). We selected patients who had regularly undergone reliable standard 24-2 perimetry tests with the Swedish Interactive Threshold Algorithm using Humphrey Field Analyzer (Carl Zeiss Meditec) at least 7 times for at least 4 years between May 1998 and May 2020. After excluding patients aged 30 years or younger at the time of the first test, our pool of VFs included 96,542 tests from 9212 eyes of 6047 patients with an average follow-up of 9.5 years. All procedures conformed to the Declaration of Helsinki, and this study was approved by the institutional review board of the Asan Medical Center, University of Ulsan, Seoul, Korea.

## PREPARING DATA FOR DEEP LEARNING

Before constructing the CNN model, we prepared data in a stepwise manner:

- 1.

From a total of 9212 eyes, 20% of the eyes (1843 eyes from 1210 patients) were randomly assigned to the test group, whereas the remaining 80% (7369 eyes from 4837 patients) were assigned to the training group. We have used clustered sampling to ensure that both eyes of the patient are assigned to the same group. The training group was used exclusively to train the CNN model, whereas the test group was used exclusively to measure the performance of the model.

- 2.

For the input, we converted VFs to 8 × 9 tensors by padding with zeros to preserve spatial characteristics. In addition, to incorporating chronological features, we stacked 3 consecutive VFs in the shape of 8 × 9 × 3 while maintaining sequential information, which was then fed to the CNN. We later named this 3D tensor as a “VF-block.” VF-blocks were allowed to overlap, hence, for the eye having

*N*distinct consecutive VFs,*N*− 2 VF-blocks have been constructed.

- 3.

In supervised NNs, a goal of training, a “label,” should be defined by the researcher. In our case, the primary label is a binary value with 1 indicating glaucomatous VF progression and 0 indicating no VF progression. The labeling process is described in detail in the following section.

- 4.

From the last test of the VF-block, a subsequent 3-year period was defined as a “surveillance period,” while the period after the surveillance period up to 6 years was defined as a “confirmatory period.” Only VF-blocks with at least 2 VFs in the surveillance period and at least 1 VF in the confirmatory period were included in the analysis. For all VF-blocks, VFs beyond 6 years from the last test of the VF-block were dropped.

- 5.

A VF-block was labeled as 1 when the progression happened within the surveillance period and was sustained in the confirmatory period.

- 6.

Given that the average follow-up time is much longer than the surveillance period (9.5 vs 3 years), a large portion of VFs would be unused if only 1 VF-block per eye were used in the analysis. Hence, for each eye, we used all unique VF-blocks with at least 2 VFs in the surveillance period and 1 VF in the confirmatory period. As a result, 43,260 labeled VF-blocks were created: 34,467 in the training group and 8793 in the test group ( Figure 1 ).

## DETERMINATION OF GLAUCOMATOUS VISUAL FIELD PROGRESSION USING 3 DIFFERENT CRITERIA

- A.

The following trend-based criteria were used to determine glaucomatous VF progression:

- 1.

First, we constructed a “standard dataset” by randomly selecting 2000 eyes from 7369 eyes of the training group. After excluding eyes with a minimum value of MD of ≤ −16 dB throughout the entire follow-up, the standard group was found to have a baseline age of 54.3 years; an MD of −3.27 dB; a PSD of 3.81 dB; and a VFI of 92.34%.

- 2.

From the calculated rate of deterioration of the global indices (MD and VFI) in the standard dataset, we derived cutoff values (a statistically significant slope <0 at

*P*< .05 for a 2-tailed test) of −0.49 dB/year for MD and −1.55%/year for VFI.^{, }

- 3.

In NN models, a VF-block was labeled as “progressed” if deterioration exceeding the cut-off values was observed during the surveillance period (3 years) and if the deterioration was not reversed in any of the VFs during the confirmation period.

- 1.
- B.

Event-based criteria were used to determine VF progression as follows:

- 1.

From the standard dataset, we calculated the probability of a change in glaucoma pattern for all points excluding the blind spot. As intertest variation is known to be associated with the initial deviation from normal age-corrected sensitivity, we divided all points of all eyes into multiple levels based on the initial age-corrected sensitivity from 0 dB to −30 dB in 2-dB steps. The average variability was calculated for all 52 VF points and each initial sensitivity level. If the average variability of a specific point-level was unobtainable, imputation was performed with an exponentially weighted moving average (EWMA) with a window of 5 sensitivity levels (2 above and 2 below). The average variability of all 52 points was smoothed with EWMA along the sensitivity ax is with a window of 3.

- 2.

A probability map of the pattern deviation-based changes was calculated for all VFs in the surveillance and confirmation periods of a given VF-block. Progression was determined based on modified EMGT (Early Manifest Glaucoma Trial) criteria: at least 3 significantly progressing points (

*P*< .05) at the same location in at least 2 consecutive test results during the surveillance period and in at least 1 test result during the confirmation period. If the progression was reversed in any of VFs in the confirmation period, the VF-block was labeled as “nonprogressed.”

- 1.
- C.

The combined criterion was defined as one fulfilling both trend-based and event-based criteria for glaucomatous VF progression.

## “BASIC” NEURAL NETWORK MODEL

We constructed a basic NN model with 3 VF-blocks of a convolutional network composed of a 3D convolution layer, a batch normalization layer, and rectified linear unit activation layer followed by 2 fully connected layers with the final activation function being a sigmoid function. Binary crossentropy was used as a loss function, and stochastic gradient descent with momentum was used as an optimization algorithm. For the validation, 10-fold crossvalidation was used with the training dataset divided into 10 subsets at the eye level.

To improve the performance of the model, we examined the accuracy of the models by adding the age of the patient for each test, an interim from the first test and laterality to the VF-blocks. Variables were concatenated to the flattened tensor before the second fully connected layer. We examined all possible combinations of additional inputs; however, none of them improved the accuracy of the model ( Figure 2 , A).

## “ENHANCED” NEURAL NETWORK MODEL

We explored another strategy to improve the model by incorporating more interim information into the basic model. We created 3 separate NN models (“preliminary” models) aimed at predicting future VF defects from the VF-blocks ( Figure 2 , B).

For each eye in the training group, we built every possible combination of a VF-block and a single VF in the surveillance period divided into 3 periods of equivalent length: a VF-block paired with a VF examination performed in the first year of the surveillance period (44,008 pairs comprising the “first-year set”); the second year of the surveillance period (54,447 pairs comprising the “second-year set”); and the third year of the surveillance period (46,095 pairs comprising the “third-year set”). Single VFs paired to VF-blocks were used as labels for training preliminary NN models.

Preliminary NN models have the same structure as the basic model with additional transposing layers (deconvolution) and the linear final activation function to match the different shape and values of a different label, an 8 × 9 tensor and values in decibels. Furthermore, instead of binary crossentropy, the mean absolute error was used as a loss function. The output from 3 preliminary NN models are predictions of glaucomatous VF progression derived from a given VF-block. The model trained with the first-year set predicts glaucomatous VF progression after ~0.5 year; the model trained with the second-year set predicts glaucomatous VF progression after ~1.5 years; and the model trained with the third-year set predicts glaucomatous VF progression after ~2.5 years.

The enhanced NN model has fundamentally the same structure as the basic model but takes as an input a concatenated tensor of VF-block and outputs from 3 preliminary models, forming an 8 × 9 × 6 tensor. As in the basic model, 3 blocks of 3D convolution, batch normalization, and rectified linear unit activation layer were followed by 2 fully connected layers.

## LINEAR MODELS

We constructed 2 linear regression models using ordinary least squares with VF-blocks from the test group: 1 model with linear regression on the global indices and the other on PLR. From the global indices model, we predicted MD and VFI at 3 years, and progression was defined as deterioration in MD or VFI exceeding cutoff values (−0.49 dB/year for MD and −1.55%/year for VFI). Progression in the PLR model was defined as at least 1 point progressing faster than −1.0 dB/year with *P* ≤ .05. ^{,} For the convenience of comparison, AUROC values were calculated from DOR (diagnostic odds ratio), assuming homogeneous dependence of the test accuracy on the threshold.

## STATISTICAL ANALYSIS

Tensor flow version 2.3 ( https://www.tensorflow.org/ ) and Keras (version 2.4: https://keras.io/ ), running in the Python (version 3.7: https://www.python.org/ ) environment were used for NN training. During the training of NNs, a class-weight attribute of fit method in the Keras models application programming interface. All other statistical analyses were performed with SAS version 9.4 software (SAS Institute). For statistical comparison, the χ ^{2} test was used for categorical variables, and the independent Student *t* -test was used for continuous variables. Comparisons between AUROC values were performed using a method proposed by DeLong and associates.

## RESULTS

The mean age of the 6047 patients was 55.0 years, and there was a mean follow-up period of 9.5 years with an average of 10.5 tests. The average global indices of the first 3 VF tests of 9212 eyes were MD of −4.61 dB; 4.53 dB of PSD; and a VFI of 87.80%. The training group included 7375 eyes with a baseline MD and a VFI of −4.61 dB and 87.8%, respectively, whereas the test group included 1837 eyes with a baseline MD and VFI of −4.59 dB and 87.9%, respectively, with no significant differences between the groups ( *P* = .905 and *P* = .820 for MD and VFI, respectively) ( Table 1 ).

Training Group | Test Group | P | |
---|---|---|---|

Patients, n | 4837 | 1210 | |

Eyes, n | 7375 | 1837 | |

Age at baseline, y | 55.3 | 55.2 | .947 |

Laterality (OD) | 49.9 | 49.4 | .680 |

No. of HVF tests | 10.5 | 10.5 | .442 |

Follow-up time, y | 9.5 | 9.6 | .644 |

Baseline MD, dB | –4.61 | –4.59 | .905 |

Baseline PSD, dB | 4.55 | 4.59 | .428 |

Baseline VFI, % | 87.8 | 87.9 | .820 |

No. of progressed VF blocks, count (%) | |||

Trend-based criterion | 3536 (10.22) | 870 (10.06) | .674 |

Event-based criterion | 3514 (10.15) | 862 (9.97) | .616 |

Combined-based criterion | 1933 (5.58) | 461 (5.33) | .358 |