Purpose
To assess short- and long-term variability on standard automated perimetry (SAP) and spectral domain optical coherence tomography (SD-OCT) in glaucoma.
Design
Prospective cohort.
Methods
Ordinary least squares linear regression of SAP mean deviation (MD) and SD-OCT global retinal nerve fiber layer (RNFL) thickness were fitted over time for sequential tests conducted within 5 weeks (short-term testing) and annually (long-term testing). Residuals were obtained by subtracting the predicted and observed values, and each patient’s standard deviation (SD) of the residuals was used as a measure of variability. Wilcoxon signed-rank test was performed to test the hypothesis of equality between short- and long-term variability.
Results
A total of 43 eyes of 43 glaucoma subjects were included. Subjects had a mean 4.5 ± 0.8 SAP and OCT tests for short-term variability assessment. For long-term variability, the same number of tests were performed and results annually collected over an average of 4.0 ± 0.8 years. The average SD of the residuals was significantly higher in the long-term than in the short-term period for both tests: 1.05 ± 0.70 dB vs. 0.61 ± 0.34 dB, respectively ( P < 0.001) for SAP MD and 1.95 ± 1.86 μm vs. 0.81 ± 0.56 μm, respectively ( P < 0.001) for SD-OCT RNFL thickness.
Conclusions
Long-term variability was higher than short-term variability on SD-OCT and SAP. Because current event-based algorithms for detection of glaucoma progression on SAP and SD-OCT have relied on short-term variability data to establish their normative databases, these algorithms may be underestimating the variability in the long-term and thus may overestimate progression over time.
Glaucoma is characterized by a progressive optic neuropathy with corresponding patterns of visual field loss. Monitoring and detection of glaucoma progression over time is paramount in management and clinical decision making, such as when to initiate or escalate therapy. However, despite the availability of numerous functional and structural tests for monitoring glaucoma, such as standard automated perimetry (SAP) and optical coherence tomography (OCT), detection of progression remains a challenging aspect of clinical practice.
Effective detection of progression depends fundamentally on the ability to differentiate true change from test-retest variability. Because glaucoma is usually a slowly progressive disease, true changes are not expected to occur over relatively short time frames. This reasoning has been used as the basis for establishing normative databases of variability by conducting repeated testing over short periods of time in glaucomatous eyes, usually within a few weeks, and calculating confidence limits or tolerance intervals of variability. If a patient is subsequently found to have a change that is greater than those confidence limits, the patient is deemed to have progressed. Such approach has been used by the so-called event-based algorithms for detecting progression such as the Guided Progression Analysis (GPA software; Carl Zeiss Meditec, Dublin, California) for SAP. In the GPA, follow-up test results are compared to baseline test results, and if a number of points show a change that exceeds the expected variability, glaucoma is declared to be progressing. The GPA has been widely used in clinical practice and clinical trials and has also been recently extended for detecting structural progression on OCT. ,
Establishing normative levels of variability based on short-term test-retest, however, may be problematic. Glaucoma patients and those suspected of having the disease are monitored over the course of many years, and there are reasons to believe that the long-term variability may be different than the short-term variability. Short-term studies of variability tend to enroll experienced patients who, not uncommonly, have participated in other studies and are thus usually highly cooperative and motivated. Also, technicians tend to be skilled and remain that way throughout the study. In contrast, in “real-world” long-term monitoring, much less motivated patients are likely to be encountered who may also have intercurrent conditions affecting test result quality. Long-term testing is likely to be done by different technicians showing a variety of degrees of training and expertise. If long-term variability is significantly different compared to short-term variability, then the algorithms for detection of progression that rely on confidence limits of variability from short-term test-retest results may provide spurious assessments of whether true change has occurred or not.
In this study, the test-retest estimates of short-term variability were compared with long-term variability of SAP and spectral domain OCT (SD-OCT) measurements in a cohort of glaucoma patients followed over time.
Subjects and Methods
Participants from this study were consecutively recruited from the clinic and were enrolled in a prospective longitudinal study designed to evaluate functional impairment in glaucoma. The Institutional Review Board approved all methods, and written informed consent was obtained from all participants. The methodology complied with the Declaration of Helsinki guidelines for human subject research, and this study adhered to the Health Insurance Portability and Accountability Act.
Patients underwent a comprehensive ophthalmologic examination, including medical history, visual acuity, slit-lamp biomicroscopy, intraocular pressure measurement using Goldmann applanation tonometry, gonioscopy and dilated fundoscopy using a 78-diopter (D) lens every 6 months. In addition, all patients included in this study were required to have open angles, visual acuity of ≥20/40, and spherical equivalent of <3.0 D throughout the study. Subjects with coexisting retinal disease, uveitis, or any systemic disease that could affect the optic nerve head, or the visual field, were excluded. Subjects who had undergone cataract surgery during the follow-up period were also excluded.
All patients underwent SAP tests, using the 24-2 Swedish interactive threshold algorithm standard of the Humphrey field analyzer II (Carl Zeiss Meditec). Only reliable visual fields with less than 15% false positives and less than 33% fixation losses were included, and the first 2 reliable examinations were excluded in order to avoid learning effects. SAP examinations with the presence of eyelid artifacts, rim artifacts, or other evidence of artifactual visual field defects not related to glaucoma were also excluded.
Patients also were tested using the Spectralis SD-OCT (software version 5.4.7.0; Heidelberg Engineering, Heidelberg, Germany) to measure the peripapillary retinal nerve fiber layer (RNFL) thickness. For SD-OCT, axial length and corneal curvature measurements were entered into the instrument’s software to ensure accurate scaling of all measurements, and the device’s eye-tracking capability was used during image acquisition to ensure that the same location of the retina was scanned over time. Images were excluded if the signal strength was <15 dB or if they were inverted or clipped. The global circumpapillary RNFL thickness was used as the study metric and corresponded to the 360° average measurement of the 1,535 A-scan points acquired from a circle of 3.45 mm centered on the optic disc, which was automatically calculated by the SD-OCT software. In this study, the pool of technicians performing perimetry and SD-OCT consisted of 5 experienced and trained technicians. However, each subject was not necessarily tested by the same technician over the course of the study.
Glaucoma diagnosis was defined as the presence of at least 2 consecutive reliable SAP test results with abnormalities at baseline (pattern standard deviation with a P value of <0.05 and/or glaucoma hemifield test results outside normal limits) with corresponding optic nerve damage (i.e., neuroretinal rim thinning, cupping, notching, or characteristic RNFL defects). Only patients with open angle glaucoma in at least 1 eye were included in the study. If both eyes of the same patient met the criteria, one eye was randomly chosen for the analysis.
Estimation of Long-Term and Short-term Variability
Figure 1 illustrates the timeline of the visits included in the determination of long- and short-term variabilities. Annual SAP and SD-OCT visits were used to estimate long-term variability. To estimate short-term variability, subjects were invited to perform a sequence of 5 additional weekly visits at some point during follow-up. The number of short- and long-term visits were matched for each subject. The same method was used to estimate variability for both the long-term as well as short-term testing, consisting of fitting ordinary least squares (OLS) linear regression models of the parameter of interest over time and then using the standard deviation (SD) of the residuals of the OLS model as an estimate of variability. This approach has been previously described and was applied in the current study for SAP MD as well as for SD-OCT global RNFL thickness. The SD of residuals was used to determine short- and long-term variability because it gives a measure of variability that is less affected by the possibility of progression over time, assuming that any progression within the observed period would be linear. For the long-term variability, only the annual visits were used for the OLS model. For the short-term variability, only the weekly visits were used.
Statistical Analysis
To test the hypothesis that long- and short-term variability are different, the differences in SD of the residuals over long- and short-term visits for both SAP MD and SD-OCT RNFL thickness were analyzed. To make the comparison, the Wilcoxon signed-rank test was used, because the data were paired and not normally distributed (confirmed by a Shapiro-Wilk test).
We investigated the relationship between the differences in SD of residuals for long- and short-term variability and disease severity for each test. Because the relationships were not linear, a quadratic curve was fitted. In addition, Spearman rank correlation was used to analyze the correlation between long- and short-term variability for all eyes and the correlation of the difference between long- and short-term variability and age. All statistical analyses were performed using Stata version 15.1 software (StataCorp, College Station, Texas). The α level (type I error) was set at 0.05.
Results
The study included 43 eyes of 43 subjects with a mean age of 71.2 ± 9.7 years old and an average follow-up time of 4.0 ± 0.8 years. Subjects had 4.5 ± 0.8 short-term visits matched with the same number of long-term visits during the study period. Demographic and clinical characteristics of the enrolled subjects are displayed in Table 1 .
Variable | 43 Subjects (43 Eyes) |
---|---|
Age (y) | 71.2 ± 9.7 |
Females | 19 (44) |
Race | |
White | 24 (56) |
(African-American descendent, %) | 14 (32) |
(Asian, %) | 4 (9) |
(American Indian or Alaska native, %) | 1 (2) |
IOP (mm Hg) | 14.9 ± 4.9 |
SAP 24-2 baseline MD (dB) | −8.4 (−25.1, 0.3) a |
RNFL global thickness at baseline (μm) | 69.8 ± 20.8 |