Inter-expert and Intra-expert Agreement on the Diagnosis and Treatment of Retinopathy of Prematurity




Purpose


To evaluate inter-expert and intra-expert agreement on the diagnosis and treatment of retinopathy of prematurity (ROP).


Design


Prospective intra- and inter-rater reliability analysis.


Methods


In this multicenter study, 260 wide-field digital photographs of 52 patients were presented to 7 recognized ROP experts on 2 consecutive assessment days 8 weeks apart. Experts were asked to assess the patients for ROP stage, presence of plus disease, presence of aggressive posterior ROP, necessity for treatment, and suggested treatment. Agreement levels were measured with Fleiss’ kappa and Cohen’s kappa.


Results


Inter-expert agreement was fair for the ROP stage (κ = 0.24), plus disease (κ = 0.32), and aggressive posterior ROP (κ = 0.35); moderate for the necessity for treatment (κ = 0.41); and fair for the kind of treatment (κ = 0.38). Perfect inter-expert agreement was found in 9.6% of all patients for ROP stage 0–5, 45.1% for ≥ stage 2 ROP, 17.3% for plus disease, 57.7% for aggressive posterior ROP, and 25% for the necessity for treatment. Intra-expert agreement was higher than inter-expert agreement and was moderate for the ROP stage (κ = 0.56) and plus disease (κ = 0.51), moderate to substantial for aggressive posterior ROP (κ = 0.60), moderate for the necessity for treatment (κ = 0.47), and substantial for the kind of treatment (κ = 0.63).


Conclusions


ROP diagnosis and treatment decisions differ between experts and by 1 expert made on different days, indicating that the grading process is subjective and there is an observer bias when diagnosing ROP. These results could influence current practice in ROP assessment and training, and prompt further refinement of international ROP guidelines.


Retinopathy of prematurity (ROP) is a multifactorial disease of preterm infants that is characterized by the abnormal vascular development of the immature retina. The International Classification of Retinopathy of Prematurity, the established guideline for accurate diagnosis of ROP, is widely accepted as the standard for ROP assessment. The International Classification of Retinopathy of Prematurity uses a number of criteria to describe ROP, like the severity of the disease (stage 0–5), location of the disease (zone I, II, III), circumferential extent of the disease, and the presence of plus disease. Plus disease is defined as a clinically significant level of vascular dilation and tortuosity at the posterior pole of the retina. Aggressive posterior ROP describes a special kind of ROP with rapid disease progression starting from the posterior pole of the retina. According to the guidelines from the Early Treatment for Retinopathy of Prematurity trial, treatment is now considered for zone I disease with any stage ROP with plus disease, zone I disease with stage 3 ROP with or without plus disease, or zone II disease with stage 2 or 3 ROP with plus disease. Following these objective guidelines can help the examiner to make the right diagnosis and treatment decision for ROP. However, the current guidelines do not consist of quantifiable measures but describe morphologic signs. ROP diagnosis depends on examiners’ subjective interpretations of these signs at clinical examinations. Consequently, examiner-dependent variability in ROP diagnosis and treatment decisions can be assumed. This is supported by population-based studies, which demonstrated pronounced differences in examiners’ findings of the incidence of ROP and the frequency of treatment in premature infants in nations with similar healthcare systems and ROP screening programs and among centers in the same neonatal network.


Methods


We conducted an international multicenter study designed as a prospective intra- and inter-rater reliability analysis to assess inter-expert and intra-expert agreement on the diagnosis and treatment of ROP. The study protocol was approved prospectively by the Ethics Committee of the Medical University of Vienna, Austria. The research followed the tenets of the Declaration of Helsinki.


Experts


Seven experienced pediatric ophthalmology consultants specializing in ROP from 4 different centers for pediatric ophthalmology participated in this study. All centers were university hospitals located in Austria, Germany, and Croatia and were equipped with a neonatal intensive care unit (NICU). Eligible experts had to meet the following criteria: (1) clinical experience of ≥5 years in screening, diagnosis, and treatment of ROP; (2) expertise in wide-field digital imaging systems like the Retcam device; and (3) authorship or co-authorship of at least 5 pediatric ophthalmic or vitreoretinal papers published in peer-reviewed journals.


Study Population


All wide-field digital images of premature infants were obtained from the database of the routine ROP in-patient screening program at the Department of Ophthalmology of the Medical University of Vienna, Austria. The images were taken with a Retcam 3-fundus camera system with a 130-degree children’s lens (Clarity Medical Systems, Pleasanton, California, USA). All patients in the database were initially included into the study. They were subsequently excluded if they failed to meet any of the following inclusion criteria: (1) prematurity with a gestational age of <32 weeks and/or a gestational weight of ≤1500 grams, (2) no retinal diseases other than ROP, and (3) availability of 5 high-quality wide-field digital images with coverage of the posterior pole and all 4 quadrants of the retina. The quality and coverage of images was checked by the project coordinator, a pediatric ophthalmologist specialized in ROP who was not involved in the grading/staging process. In total, 52 patients each with 5 pictures, resulting in 260 high-quality wide-field digital images, were ultimately included.


Rating Procedure


The fundus images were presented to the experts for examination via a standardized web-based database. Five images of 1 selected eye of each patient covering all 4 quadrants and the posterior pole of the retina were presented. All personal patient data were masked in all the images and the experts were masked to the patients’ clinical information. Patients were identified by a randomized patient identification number only. After examining the pictures experts were asked to evaluate each patient according to the International Classification of Retinopathy of Prematurity for the categories ROP stage (0–5), presence of plus disease (2-level categorization; plus or not plus), and presence of aggressive posterior ROP (aggressive posterior ROP or no aggressive posterior ROP). Furthermore, the experts were asked to evaluate each patient on the necessity for treatment (yes or no) according to the criteria set by the Early Treatment for Retinopathy of Prematurity randomized trial. If experts answered “yes,” they were asked to suggest treatment (diode laser, anti–vascular endothelial growth factor treatment, vitreoretinal surgery). Examinations took place on 2 consecutive assessment days 8 weeks apart. Experts were asked to evaluate the images/patients on assessment day 1 and reevaluate them on assessment day 2 (test to retest). Images were presented in a random order on both days. Figure 1 provides an example of a patient presented to the experts. Supplemental Figures 1–3 show additional patient examples (Supplemental Material available at AJO.com ).




Figure 1


Example of an image set of a patient (identification number 36) presented in this study. The 7 experts rated this patient as follows: retinopathy of prematurity stage 0 (0), 1 (0), 2 (0), 3 (6), 4 (0), 5 (0); plus disease yes (6), no (1); aggressive posterior retinopathy of prematurity yes (6), no (1); necessity for treatment yes (7), no (0); kind of treatment diode laser (1), anti–vascular endothelial growth factor therapy (6), vitreoretinal surgery (0).


Statistical Analysis


The data were analyzed using SPSS Version 18 (SPSS Inc, Chicago, Illinois, USA), Microsoft Excel 2007 (Microsoft Corp, Seattle, Washington, USA), and the Cohen and Fleiss Kappa Program (StatsToDo, Brisbane, Australia). Demographic data values were expressed as the mean ± standard deviation and range. Multi-expert Fleiss’ kappa was used for measuring inter-expert agreement. Intra-expert agreement was measured using Cohen’s weighted kappa statistic for continuous variables and unweighted kappa statistic for dichotomous variables. Kappa values range from −1.00 (complete disagreement) to 1.00 (complete agreement) with a value of 0.00 defined as no more agreement than can be expected by chance. A value of 0.00-0.20 is accepted as slight agreement, 0.21-0.40 as fair agreement, 0.41-0.60 as moderate agreement, 0.61-0.80 as substantial agreement, and 0.81-1.00 as almost perfect agreement. In addition, percentage values for the inter-expert and intra-expert absolute agreement were provided.




Results


Demographic Patient Characteristics


The mean gestational age of the patients included into this study (n = 52) was 25 weeks 4.2 days ± 12.6 days (range: 23 weeks 1 day – 29 weeks 4 days). Mean birth weight was 729 ± 221 g (range 380–1360 g). Of the 52 patients, 36 (69.2%) were male.


Nondeterminable Image Sets/Patients


Experts could also define the patient/images as nondeterminable owing to insufficient image quality and not rate the patient for that category. This was the case in 2.7% of the rating decisions in the category ROP stage, 1.6% in the category plus disease, 1.9% in the category aggressive posterior ROP, and 3.0% in the category necessity for treatment.


Inter-expert Agreement


The inter-expert agreement for ROP diagnosis and treatment of all experts was analyzed on assessment day 1. Overall, inter-expert agreement (Fleiss’ kappa) was fair to moderate, as Figure 2 shows. Agreement was fair (κ = 0.24) on ROP staging (ROP 0–5). On the presence of ROP stage 2 or higher it was κ = 0.32. Agreement was also fair on the presence of plus disease (κ = 0.32) and of aggressive posterior ROP (κ = 0.35). Inter-expert agreement on the necessity for treatment for ROP was moderate with κ = 0.41. If experts recommended treatment, agreement was fair on the kind of treatment suggested (κ = 0.38). To test the stability and reproducibility of the inter-expert agreement, we calculated and compared the mean deviation of kappa values for assessment days 1 and 2. Inter-expert agreement levels were stable in all categories, with a mean deviation of κ = 0.06 for all categories. Two experts showed somewhat divergent assessment results from the other 5 with 1 evaluating with above-average and the other with below-average stages of ROP disease. Excluding these 2 experts from the analysis increased inter-expert agreement on ROP diagnosis to κ = 0.26 for ROP staging (0–5), κ = 0.55 for plus disease, and κ = 0.43 for aggressive posterior ROP. Table 1 shows absolute inter-expert agreement levels for each patient and each category. Perfect agreement among all experts was found in 5 patients (9.6%) for the category ROP stage, 23 (45.1%) for ≥ stage 2 ROP, 9 (17.3%) for plus disease, 30 (57.7%) for aggressive posterior ROP, and 13 (25%) on the necessity for treatment. Figure 3 shows perfect inter-expert agreement levels for each category according to the number of experts who agreed on the same diagnosis/treatment decision. For example, perfect agreement for the ≥stage 2 ROP was found among 100% of the experts in 23 patients, >80% in 33 patients, >70% in 39 patients, and >50% in 51 patients.




Figure 2


Inter-expert agreement and intra-expert agreement for retinopathy of prematurity diagnosis and treatment. Kappa 0.00–0.20 regarded as slight agreement, kappa 0.21–0.40 as fair agreement, kappa 0.41–0.60 as moderate agreement, kappa 0.61–0.80 as substantial agreement and kappa 0.81–1.00 as almost perfect agreement. ROP = retinopathy of prematurity.


Table 1

Absolute Agreement Among 7 Experts Reviewing 52 Patients on Retinopathy of Prematurity Diagnosis and Indication for Treatment























































































































































































































































































































































































Patient ROP Stage (0–5) ≥Stage 2 ROP Plus Disease Aggressive Posterior ROP Treatment Necessity
1 3 (43) 4 (57) 5 (71) 7 (100) 7 (100)
2 5 (71) 7 (100) 5 (71) 6 (86) 4 (67) a
3 6 (86) 7 (100) 4 (57) 6 (86) 6 (86)
4 3 (50) a 4 (67) a 6 (86) 4 (57) 3 (50) a
5 4 (57) 7 (100) 5 (71) 7 (100) 5 (71)
6 4 (57) 7 (100) 4 (67) a 6 (100) a 4 (57)
7 6 (86) 7 (100) 6 (100) a 3 (50) a 7 (100)
8 4 (57) 7 (100) 5 (71) 6 (100) a 4 (67) a
9 6 (86) 7 (100) 6 (86) 5 (71) 4 (57)
10 4 (57) 7 (100) 4 (57) 6 (100) a 4 (67) a
11 7 (100) 7 (100) 7 (100) 4 (57) 6 (86)
12 3 (50) a 4 (67) a 4 (67) a 4 (67) a 4 (57)
13 3 (43) 4 (57) 7 (100) 7 (100) 7 (100)
14 3 (43) 5 (71) 7 (100) 7 (100) 7 (100)
15 6 (100) a 6 (100) a 7 (100) 5 (71) 7 (100)
16 6 (86) 7 (100) 6 (86) 7 (100) 7 (100)
17 4 (57) 6 (86) 4 (57) 6 (86) 4 (67) a
18 5 (83) a 6 (100) a 6 (86) 7 (100) 6 (86)
19 5 (71) 6 (86) 6 (86) 7 (100) 6 (86)
20 3 (50) a 4 (67) a 4 (67) a 5 (71) 4 (57)
21 4 (57) 5 (71) 5 (71) 7 (100) 5 (71)
22 6 (86) 6 (86) 5 (71) 4 (57) 4 (57)
23 5 (71) 7 (100) 4 (57) 7 (100) 5 (71)
24 5 (71) 7 (100) 5 (71) 7 (100) 6 (86)
25 6 (86) 5 (71) 6 (86) 7 (100) 6 (86)
26 4 (57) 6 (86) 5 (71) 7 (100) 4 (67) a
27 4 (57) 6 (86) 5 (71) 4 (57) 4 (67) a
28 3 (43) 5 (71) 5 (71) 7 (100) 4 (67) a
29 4 (57) 7 (100) 5 (71) 7 (100) 4 (57)
30 5 (71) 6 (86) 6 (86) 4 (57) 6 (86)
31 5 (83) a 5 (83) a 7 (100) 6 (86) 7 (100)
32 5 (71) 6 (86) 6 (86) 7 (100) 5 (71)
33 7 (100) 7 (100) 6 (86) 4 (57) 7 (100)
34 6 (86) 7 (100) 5 (71) 7 (100) 5 (71)
35 5 (71) 7 (100) 7 (100) 7 (100) 6 (86)
36 6 (100) a 6 (100) a 6 (86) 6 (86) 7 (100)
37 6 (86) 7 (100) 6 (86) 7 (100) 5 (71)
38 4 (57) 7 (100) 5 (71) 7 (100) 5 (71)
39 4 (67) a 5 (83) a 7 (100) 4 (57) 5 (71)
40 6 (86) 7 (100) 6 (86) 6 (86) 6 (86)
41 4 (57) 4 (57) 5 (100) b 3 (60) b 5 (100) b
42 5 (71) 6 (86) 6 (86) 5 (71) 4 (57)
43 4 (57) 4 (57) 6 (86) 7 (100) 5 (71)
44 4 (57) 5 (71) 5 (71) 5 (71) 6 (86)
45 4 (57) 5 (67) 5 (71) 7 (100) 4 (67) a
46 6 (86) 5 (67) 4 (57) 7 (100) 6 (86)
47 6 (86) 5 (67) 5 (71) 7 (100) 6 (86)
48 4 (57) 4 (57) 6 (86) 7 (100) 6 (86)
49 4 (67) a 5 (78) a 6 (86) 7 (100) 6 (86)
50 6 (86) 5 (67) 6 (86) 7 (100) 7 (100)
51 4 (57) 5 (67) 6 (86) 7 (100) 7 (100)
52 6 (100) a 6 (100) a 6 (86) 5 (71) 7 (100)

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jan 6, 2017 | Posted by in OPHTHALMOLOGY | Comments Off on Inter-expert and Intra-expert Agreement on the Diagnosis and Treatment of Retinopathy of Prematurity

Full access? Get Clinical Tree

Get Clinical Tree app for offline access