To develop and validate a robust standardized reporting tool for describing retinal findings in children examined for suspected abusive head trauma.
A prospective interobserver and intraobserver agreement study.
An evidence-based assessment pro forma was developed, recording hemorrhages (location, layer, severity) and additional features. Eight consultant pediatric ophthalmologists and 7 ophthalmology residents assessed a series of 105 high-quality RetCam images of 21 eyes from abusive head trauma cases with varying degrees of retinal hemorrhage and associated findings. The pediatric ophthalmologists performed a repeat assessment of the randomized images. The images were observed simultaneously with standardized display settings. Interobserver and intraobserver agreement was assessed using free-marginal multirater kappa, intraclass correlation coefficients, and concordance coefficients.
Almost-perfect interobserver agreement was observed for residents and pediatric ophthalmologists recording the presence and number of fundus hemorrhages (intraclass correlation coefficients 0.91 and 0.87, respectively) and the location of hemorrhages (concordance coefficients 0.86 and 0.85, respectively). Substantial agreement was observed by both groups regarding size of hemorrhage (concordance coefficients 0.73 and 0.76), moderate agreement for hemorrhage morphology (concordance coefficients 0.53 and 0.52), and other findings (concordance coefficients 0.48 and 0.59). Intraobserver agreement for pediatric ophthalmologists varied by question, ranging from substantial to perfect for the presence, number, location, size, and morphology of fundus hemorrhage.
We have developed and validated a standardized clinical reporting tool for ophthalmic findings in suspected abusive head trauma, which has excellent interobserver and intraobserver agreement among consultant specialists and residents. We suggest that its use will improve standardized clinical reporting of such cases.
Retinal hemorrhages in infancy require the ophthalmologist to consider a long differential diagnosis that may be responsible for the clinical picture. In a child less than 3 years of age, retinal hemorrhages in the presence of intracranial injury have a 71% positive predictive value for abusive head trauma. However, retinal hemorrhages have also been recorded in nonabusive head trauma, particularly following high-velocity injuries. Previous literature has given certain patterns of retinal hemorrhages (extensive, multilayered, and extending to the periphery) and associated findings of retinoschisis and perimacular folds important diagnostic relevance. While prevalence of retinal hemorrhages in nonabusive head trauma is rare, a number of case reports have indicated retinal findings similar to abusive head trauma. The wide range of descriptive terms along with their inconsistent and variable use have hindered attempts to precisely define the specificity or sensitivity of such features for distinguishing abusive head trauma from nonabusive head trauma.
There is a lack of standardized reporting of retinal hemorrhages in terms of their distribution, severity, layers of involvement, and other associated findings commonly described in abusive head trauma. This was noted in a smaller review of the diagnostic accuracy of ocular signs in abusive head trauma, where only a minority of abusive head trauma cases were determined to have traumatic retinoschisis and perimacular folds present. Studies do not routinely record their findings in a uniform manner; therefore it is unclear if such features are not present or are simply not recorded.
It is vital that the first ophthalmologic examination be carried out and recorded to a high standard, as this will be relied on by the clinical team caring for the child and may be used in the provision of reports to child protection investigating agencies, as well as by any expert witnesses that the courts may instruct.
The increasing recognition of the importance of addressing standardization of retinal hemorrhage documentation has brought forward several recent “observer agreement” studies. Fleck and associates studied interobserver reliability in reporting the location of retinal hemorrhages using a zonal classification adapted from Wilkinson and associates, and Mulvihill and associates describe the interobserver and intraobserver agreement on classification layers of retinal hemorrhage involvement. While these studies highlight important information that may be used in the development of a descriptive tool, they require ophthalmologists to adopt a new classification and use a template for description of the location of retinal hemorrhages, which does not lend itself easily to a bedside evaluation. A study that addresses the reporting of all aspects of retinal findings, which include the location, frequency, morphology, layer of involvement, and other characteristic features such as retinoschisis and perimacular folds, is much needed. In this study we have set out to develop standardized descriptive terminology that is currently used in clinical practice, to develop a validated clinical tool that could be adopted for recording positive and negative retinal findings in children undergoing examination for suspected abusive head trauma or for research purposes.
Materials and Methods
A pro forma was designed to record data from multiple images of 1 eye utilizing commonly used descriptive terms in the examination of abusive head trauma cases (Supplemental Figures 1 and 2 , available at AJO.com ). This recorded the presence of retinal hemorrhages, number (few, many, confluent), location (posterior pole and periphery), morphology (dome-shaped, dot hemorrhage, white-centered, flame-shaped, blot hemorrhage), layer of involvement (preretinal, superficial retinal, deep retinal, subretinal/choroid, multilayer), and a list of other characteristic descriptions. We used only 2 zones (zone 1 = posterior pole, zone 2 and 3 = periphery) and based them on the International Classification of Retinopathy of Prematurity (ROP). In addition to these data, assessors were asked to note any additional comments in free text. The images were from 21 cases of infants with abusive head trauma, with varying levels of severity of retinal findings ranging from no retinal hemorrhages to multiple confluent hemorrhages with macular retinoschisis and disc swelling, of which some are demonstrated in Supplemental Figures 3 and 4 (available at AJO.com ). The use of anonymized unidentifiable RetCam images was approved by the local research and development department of the University Hospital of Wales.
Seven pediatric ophthalmologists from the United Kingdom took part in the assessment exercise, in which they assessed and described the findings noted on 105 high-quality RetCam images (anonymized) from the 21 eyes. Assessors viewed the images simultaneously and recorded their findings independently, in a standardized, supervised setting.
All RetCam fundal images were obtained with a RetCam II Wide Field Imaging System 2007 (Clarity Medical Systems, Inc, Pleasanton, California, USA) and a 130-degree infant lens by a single experienced user (P.W.). Each case displayed 1 eye from a patient, with 5 images representing views from primary position and superior, inferior, nasal, and temporal fundus. These images were displayed with a 1025 × 768 pixels resolution projector and were supplemented by individual laptops with resolutions ranging from 1366 × 768 to 1024 × 600 pixels.
RetCam images were displayed continuously at 4-minute intervals per patient, ie, 4 minutes for the 5 images of each patient. Assessors were forbidden from conferring during the assessments, which were independently supervised. A 15-minute rest interval was allowed halfway through the exercise. One form was used per patient.
A second assessment was carried out a month later, whereby 6 assessors from session 1 were shown the same images again, after permutation block randomization. In addition to this, 7 resident ophthalmologists and 1 additional pediatric ophthalmologist were also enrolled into the session 2 exercise, performing the same assessment simultaneously with all assessors working independently as before. Location conditions, display settings with supplementation of individual laptops, and assessment forms used in session 2 were identical to session 1. Assessors at both sessions were given instructions on how to complete the record form at the beginning of the exercise.
Data attained from the assessment forms completed in both sessions were entered into a Microsoft Excel spreadsheet, and each question within the assessment form was analyzed in turn. There were 7 questions, 15 participants (made up of 8 pediatric ophthalmologists, 6 of whom repeated the exercise, and 7 residents), and 21 sets of patient images, hence a potential 3087 responses in total. There were 2 patient image sets where 2 consultants in the first session, and 1 different consultant in the second session, noted that there was no fundus hemorrhage present. This resulted in incomplete fields for those image sets for the remaining questions; hence these image sets were excluded from consequent analyses (leaving 19 image sets for comparison for questions 2 through 7). Furthermore, there was 1 case of missing data from a repeat consultant, for question 2 on the pro forma in session 2. The patient image set relating to this missing field was excluded from the analysis for question 2 only, leaving 18 image sets to compare here.
The double-masked nature of the study design and the controlled test conditions permitted the assumption that participants had no a priori knowledge of the quantity of cases that were distributed into each category. Thus, a free-marginal multirater kappa was employed to assess inter- and intraobserver agreement on the presence of hemorrhages. The fixed-marginal multirater kappa was also computed for comparison of results with other studies, and since participants may have predicted from the assessment form more “yes” than “no” responses, with repeat participants having a greater feel for what to expect the second time around. Generally, free-marginal kappa may overstate the true level of agreement, while fixed-marginal kappa may understate the true level of agreement.
When considering the number of hemorrhages, and combining presence and number of hemorrhages (with scaled responses from none to confluent), an intraclass correlation coefficient was calculated to evaluate the level of agreement between assessors. There are 6 forms of the intraclass correlation coefficient, each applicable for different specific scenarios; for this analysis, model 2 was incorporated (assuming a random sample of assessors was selected from a larger population, where each rater assesses each patient image set), with individual ratings constituting the unit of analysis. Note that all 3 types of model were computed for individual ratings and little discrepancy between the intraclass correlation coefficients was found, thus validating robustness in our methodology and results. The coefficient represents concordance, where 1 is perfect agreement and 0 is no agreement at all.
The remaining questions on the assessment form were all of similar design, and hence the same type of agreement statistic could be calculated for the analysis of each. Participants were asked to “Circle all boxes that apply; include multiple boxes if necessary,” creating a multiple attribute response variable. A chance-corrected concordance measure for 2 raters, and overall pairwise agreement statistic for multiple raters, was employed. Confidence intervals of 95% were computed for each concordance coefficient and significant differences between interrater concordance statistics were tested at the 5% level.
To allow comparison of our results to those of other published literature on this topic (Fleck and associates and Mulvihill and associates ), rules of thumb for interpretation of the agreement statistics are categorized in Table 1 .
|Agreement Score||Classification Guideline|
|0.00||Agreement expected by chance|
Table 2 displays interobserver agreement scores for sessions 1 and 2 as well as intraobserver agreement scores for all categories assessed. All categories of assessment had positive agreement scores, the best being detection of presence of hemorrhage, ranging from “perfect” to “almost-perfect” agreement for both pediatric ophthalmologists and residents. There was also very strong agreement (“almost-perfect” to “substantial” agreement scores) achieved by residents, and in both sessions for pediatric ophthalmologists, in describing presence, number, location, and sizes of hemorrhage.
|Feature Addressed||Session 1||Session 2||Session 2|
|Pediatric Ophthalmologist Interobserver Agreement||Pediatric Ophthalmologist Interobserver Agreement||Resident Ophthalmologist Interobserver Agreement||Pediatric Ophthalmologist Intraobserver Agreement|
|Presence of retinal hemorrhage||0.95||0.97||1.00||0.90–1.00|
|Almost perfect||Almost perfect||Perfect|
|Number of hemorrhages||0.76||0.76||0.84||0.64–0.95|
|Presence and number of hemorrhages||0.86||0.89||0.92||0.77–0.97|
|Almost perfect||Almost perfect||Almost perfect|
|Location of hemorrhages||0.85||0.96||0.87||1.00–0.80|
|Almost perfect||Almost perfect||Almost perfect|
|Layers of involvement||0.31||0.65||0.60||0.07–0.82|
|Sizes of hemorrhage||0.77||0.85||0.72||0.49–0.89|
Agreement levels of all assessors relating to the layer of retina were “fair” to “substantial” and improved when “deep retinal” and “superficial retinal” layers were grouped into “intraretinal,” reducing the number of response choices from 5 to 4 ( Supplemental Table , available at AJO.com ). When each of the 4 responses was analyzed individually to explore which layer scored the strongest between assessors, “preretinal” and “intraretinal” responses for residents and session 2 pediatric ophthalmologists were highest, with “almost-perfect” agreement (0.8 and 0.85, respectively for “preretinal” and 0.8 and 0.9, respectively for “intraretinal”). “Multilayer” response had a “substantial” agreement score (pediatric ophthalmologists session 2 = 0.7 and residents = 0.8).
There was improved agreement among pediatric ophthalmologists over time, from session 1 to session 2 ( Table 3 ). This was also observed when responses were individually analyzed (“preretinal” −0.375 CI [−0.506, −0.245], “intraretinal” −0.404 CI [−0.533, −0.274], “multilayer” −0.481 CI [−0.644,−0.317]), except in “subretinal” response (−0.063 CI [−02.09, 0.083]).