Purpose
To report the test–retest variability of two health-related quality-of-life instruments: the new Adult Strabismus 20 (AS-20) and the National Eye Institute 25-item Visual Function Questionnaire (NEI VFQ-25), in adults with strabismus.
Design
Prospective case series.
Methods
Fifty-five adult patients in a clinical practice with stable strabismus completed the AS-20 and the NEI VFQ-25 at 2 visits, without intervening treatment. Questionnaires were completed the second time either at a subsequent office visit, immediately before surgery, or by mail. Intraclass correlation coefficients were calculated. Ninety-five percent limits of agreement and 95% confidence intervals around the 95% limits of agreement also were calculated.
Results
There was excellent agreement of overall questionnaire scores for the AS-20 (intraclass correlation coefficient, 0.92) and NEI VFQ-25 (intraclass correlation coefficient, 0.94). The 95% limits of agreement for overall scores were 14.3 points (95% confidence interval, 10.9 to 17.7) for the AS-20 and 11.1 points (95% confidence interval, 8.5 to 13.8) for the NEI VFQ-25. The lower test–retest variability of the VFQ-25 seemed to be partly the result of ceiling effects with many scores at the normal end of the range.
Conclusions
The new AS-20 and the NEI VFQ-25 show excellent test–retest reliability in adults with strabismus. Change exceeding 95% limits of agreement (14 points on the AS-20 and 11 points on the VFQ-25) is indicative of real change in an individual patient. The AS-20 may be more useful than the VFQ-25 because it is less prone to ceiling effects in adults with strabismus.
Formal assessment of health-related quality of life (HRQOL) has been recommended in the management of adult strabismus. Several vision-specific HRQOL instruments have been used in the evaluation of adults with strabismus, but reports on the test–retest reliability of these instruments are sparse. Reliability data provide an estimate of variance caused by random error or measurement error. Test–retest reliability data describe the extent to which repetition of the test yields the same results when no underlying change in health has occurred. Limits of agreement calculated from test–retest data are particularly helpful for interpreting changes in scores over time for an individual patient and have been used in other fields, such as prism and cover test measurements in strabismus. In previous reports, we described the development and initial validation of the Adult Strabismus 20 (AS-20), a strabismus-specific HRQOL questionnaire for adults. In the present study, we report the test–retest reliability of the AS-20 questionnaire in a cohort of adults with strabismus. For comparison, we also assessed the test–retest reliability of the National Eye Institute 25-item Visual Function Questionnaire (NEI VFQ-25) in the same cohort of adult strabismus patients.
Methods
Fifty-five adult strabismus patients (median age, 44 years; range, 18 to 80 years) were recruited prospectively from outpatient clinics and completed both the AS-20 and the NEI VFQ-25 at 2 time points within 1 year. Questionnaires were completed in the office for the first administration. The second administration was either (1) in the office at a subsequent examination (n = 29; 25 to 144 days later; median, 66 days), (2) immediately before surgery (1 day later in 17 patients, 6 days later in 1 patient), or (3) by mail within 2 days (n = 8). Patients completing the questionnaires for the second time immediately before surgery or by mail were instructed to complete the questionnaires as if they had not completed them before. Patients with inherently variable strabismic conditions (e.g., ocular myasthenia gravis) were excluded to limit the study cohort to patients who were stable between questionnaire administrations. We also excluded patients who had undergone strabismus surgery within 1 year before the first examination because patients’ symptoms and perceptions might have changed during the postoperative period. For the office retest administration, patients were required to have stable strabismus (no change in angle of deviation of more than 10 PD in primary position) and no intervening treatment or change in treatment. Thirty-eight (69%) were female and 49 (89%) of patients self-reported their race as white. Strabismus diagnoses were idiopathic in 32 (58%) patients, neurologic in 19 (35%) patients, and mechanical in 4 (7%) patients. Of our patients, 35 (64%) had diplopia, 9 (16%) had rare diplopia, and 11 (20%) did not have diplopia. Visual acuity ranged from 20/15 to 20/30 in the better eye (median, 20/20) and 20/15 to 20/4000 in the worse eye (median, 20/20).
Responses for each item on the AS-20 were recorded using a 5-point Likert-type scale (never, rarely, sometimes, often, and always), and converted, for each patient, to a mean score ranging from 0 (worst HRQOL) to 100 (best HRQOL). The NEI VFQ-25 contains Likert-type scales and also yields a mean individual patient score from 0 to 100. An administrable version of the AS-20 is available online at http://public.pedig.jaeb.org (accessed July 31, 2009) and of the NEI VFQ-25 at http://www.nei.nih.gov/resources/visionfunction/vfq_ia.pdf (accessed July 31, 2009).
Statistical Analysis
For both the AS-20 and the NEI VFQ-25, differences in scores at first and second administrations were compared using signed-rank tests. Bland-Altman plots were used to analyze the variability of the differences. Half widths of the 95% limits of agreement were calculated using 1.96 standard deviation to define the limits within which 95% of the differences should lie. The 95% confidence intervals (CIs) around the 95% limits of agreement also were calculated. Intraclass correlation coefficients were calculated between first and second administrations. Analyses were repeated to compare variability in patients with and without diplopia.
Results
Differences Between First and Second Administrations
As expected in a test–retest study of a reliable instrument, there were no significant differences between overall and subscale scores on the AS-20 ( P > .5 for all comparisons; Table ). Nevertheless, for the NEI VFQ-25, scores were very slightly higher on the second administration (better HRQOL) for the overall score (75.8 versus 77.5; P = .02) and for the difficulties with near activities subscale (71.5 versus 75.6; P = .01). There were no other significant differences found in NEI VFQ-25 subscale scores between first and second questionnaire administrations.
Questionnaires and Subscales | No. | Test | Retest | Difference | P Value a | 95% LOA (95% CI) | ICC (95% CI) |
---|---|---|---|---|---|---|---|
AS-20 | |||||||
Overall | 55 | 58.9 ± 18.5 | 59.5 ± 17.8 | 0.6 ± 7.3 | .5 | 14.3 (10.9 to 17.7) | 0.92 (0.87 to 0.95) |
Functional scale | 55 | 52.2 ± 22.5 | 52.5 ± 22.2 | 0.3 ± 9.9 | .9 | 19.5 (14.9 to 24.1) | 0.90 (0.84 to 0.94) |
Psychosocial scale | 55 | 65.6 ± 24.9 | 66.4 ± 25.5 | 0.8 ± 9.0 | .7 | 17.7 (13.5 to 21.9) | 0.94 (0.89 to 0.96) |
VFQ-25 | |||||||
Overall | 55 | 75.8 ± 16.8 | 77.5 ± 16.0 | 1.7 ± 5.7 | .02 | 11.1 (8.5 to 13.8) | 0.94 (0.89 to 0.96) |
General health | 55 | 68.2 ± 26.1 | 69.5 ± 23.4 | 1.4 ± 11.2 | .5 | 22.0 (16.8 to 27.1) | 0.89 (0.83 to 0.94) |
General vision | 55 | 70.2 ± 15.8 | 69.1 ± 17.6 | −1.1 ± 16.1 | .7 | 31.5 (24.1 to 38.9) | 0.54 (0.33 to 0.70) |
Ocular pain | 55 | 74.5 ± 23.2 | 78.2 ± 21.5 | 3.6 ± 18.4 | .2 | 36.1 (27.6 to 44.6) | 0.66 (0.48 to 0.78) |
Near activities | 54 b | 71.5 ± 23.6 | 75.6 ± 21.4 | 4.2 ± 11.5 | .01 | 22.6 (17.2 to 28.0) | 0.85 (0.76 to 0.91) |
Distance activities | 55 | 76.1 ± 22.3 | 78.3 ± 21.3 | 2.3 ± 10.2 | .1 | 20.0 (15.3 to 24.7) | 0.89 (0.81 to 0.93) |
Vision specific | |||||||
Social functioning | 55 | 86.4 ± 16.9 | 89.3 ± 15.9 | 3.0 ± 11.0 | .05 | 21.6 (16.5 to 26.7) | 0.76 (0.63 to 0.85) |
Mental health | 55 | 59.8 ± 29.8 | 64.0 ± 28.4 | 4.2 ± 13.6 | .08 | 26.7 (20.4 to 33.0) | 0.88 (0.81 to 0.93) |
Role difficulties | 54 b | 65.7 ± 31.1 | 66.9 ± 28.1 | 1.2 ± 14.6 | .4 | 28.7 (21.8 to 35.5) | 0.88 (0.80 to 0.93) |
Dependency | 52 c | 84.1 ± 23.2 | 84.5 ± 24.6 | 0.3 ± 9.0 | .6 | 17.7 (13.4 to 22.0) | 0.93 (0.88 to 0.96) |
Driving | 52 c | 77.1 ± 22.7 | 76.4 ± 20.5 | −0.6 ± 10.7 | .9 | 20.9 (15.8 to 26.0) | 0.89 (0.82 to 0.94) |
Color vision | 55 | 98.2 ± 6.6 | 98.6 ± 5.7 | 0.5 ± 5.9 | 1.0 | 11.5 (8.8 to 14.2) | 0.55 (0.34 to 0.71) |
Peripheral vision | 55 | 71.4 ± 26.1 | 73.6 ± 25.2 | 2.3 ± 16.9 | .3 | 33.0 (25.2 to 40.8) | 0.78 (0.66 to 0.87) |
a P value based on nonparametric paired comparison (signed rank).
Differences Between Methods of Administration
Analyzed separately by method of administration, the intraclass correlation coefficient for the AS-20 was slightly lower (indicating more variability between measures) for the office and presurgery administrations than for the mail administration (0.91; 95% CI, 0.82 to 0.96; vs 0.90; 95% CI, 0.76 to 0.96; vs 0.93; 95% CI, 0.71 to 0.98). For the NEI VFQ-25, the intraclass correlation coefficient was numerically lower, but not significantly lower, for the office than for the presurgery or mail administrations (0.92; 95% CI, 0.84 to 0.96; vs 0.95; 95% CI, 0.87 to 0.98; vs 0.94; 95% CI, 0.75 to 0.99).
For our estimates of the 95% limits of agreement, we found a similar pattern for the AS-20, where the estimates from retests obtained by office and before surgery were slightly higher (indicating more variability between measures) than by mail (15.2; 95% CI, 10.2 to 20.3; vs 14.5; 95% CI, 8.2 to 20.8; vs 10.4; 95% CI, 2.8 to 18.0). For the NEI VFQ-25, the 95% limits of agreement also were slightly higher for the office and presurgery administrations than by mail (12.9; 95% CI, 8.6 to 17.1; vs 10.3; 95% CI, 5.8 to 14.8; vs 5.5; 95% CI, 1.5 to 9.5). Because the estimates of different methods of administration were similar and the 95% CIs of our estimates included the point estimates of the other methods of administration, we combined the data for subsequent analyses.
Overall Intraclass Correlations
Using a published scale, agreement between examinations, as measured by the intraclass correlation coefficient, was almost perfect (> 0.80) for both the AS-20 (0.92; 95% CI, 0.87 to 0.95; Table ) and the NEI VFQ-25 (0.94; 95% CI, 0.89 to 0.96). Agreement also was almost perfect between questionnaire administrations for both AS-20 subscales. For the NEI VFQ-25 subscales, agreement was almost perfect on 7 of the 12 subscales, substantial (> 0.6 to 0.80) in 3 of 12 subscales, and moderate (> 0.4 to 0.6) in 2 of 12 subscales ( Table ).
Overall Distribution of Test–Retest Differences
Test–retest differences are plotted against mean scores, as described by Bland and Altman, in the Figure . Across AS-20 scores and across NEI VFQ-25 scores within each instrument, variability did not seem to depend on severity ( Figure ). Nevertheless, the NEI VFQ-25 scores were clustered toward the normal end of the range in these adults with strabismus, suggesting a possible ceiling effect. Comparing the first with the second administration, neither the AS-20 nor the NEI VFQ-25 demonstrated any significant regression to the mean (data not shown).