What a difference time makes! A total of 45 years has passed since the American Journal of Ophthalmology published my first editorial. Then, I had faulted most clinical reports in the ophthalmic literature for ignoring the importance of studying populations of suitable sample size and selection (jokingly suggesting, but only in part, “my last 100 cases”; “during my 34 years in practice”; and “since moving to Alaska”); for omitting controls; for failing to prevent bias (in all sorts of ways); and for drawing conclusions not even remotely justified by the data.
Things have changed mightily! The recent paper by Gordon and associates painstakingly measures the impact of just 1 of the many techniques now routinely employed in clinical trials: the use of an endpoint committee (EPC). Their detailed analyses demonstrate just how important careful design and procedures are for obtaining the “truth” of a trial’s outcome. This was no run-of-the-mill clinical trial. The Ocular Hypertension Treatment Study (OHTS) was the first definitive demonstration that lowering intraocular pressure (IOP) delayed glaucomatous optic nerve damage, and changed glaucoma management forever (or, at the very least, confirmed our pre-existing prejudices).
What is an EPC? One tool, among many, for increasing the accuracy of the diagnosis of interest, which is essential to any clinical trial.
In this particular study, the question asked was whether patients with elevated IOP treated with IOP-lowering medication were less likely to develop definitive evidence of glaucoma-associated optic nerve destruction than untreated controls. The criteria for recognizing “glaucoma-associated optic nerve destruction” were almost as fluid then as they are now. Patients underwent periodic examinations that included stereo photographs of their optic disc and threshold perimetry. Visual fields were read at one “reading center” where the readers determined whether defined changes had appeared in the visual fields. The other reading center determined whether there was discernable erosion of the optic discs. Both reading centers were staffed by well trained, standardized readers, masked as to whether the subjects were in the treatment or the control arm. Readers were neither trained nor asked to determine whether any of the defined changes they were instructed to report were likely due to glaucoma. Whenever a change in either parameter was noted on 2 successive visits, a report was sent to the EPC, where 3 clinically active glaucoma specialists, masked to the patients’ treatment status, determined whether those changes were “likely” to be compatible with primary open-angle glaucoma (POAG).
Between one-third and one-half of the changes (“endpoints” reported by the reading centers) were considered by the experts to be compatible with POAG; in the others, the changes noted in the reading centers were considered to be spurious or due to other causes. By filtering out instances in which demonstrable changes in the appearance of the disc or visual field were not likely due to POAG, the endpoint, glaucoma-associated optic nerve damage progression, was better delineated, and the test of IOP-lowering medication as a means of preventing POAG damage was better defined. Indeed, had the changes in the appearance of the optic disc or visual field not been further refined by the EPC “experts,” the recently published analysis indicates that the apparent size of the impact would have been considerably smaller, as IOP-lowering medication was unlikely to reduce the risk of other causes of apparent visual field deterioration (age-related macular degeneration, retinopathy, droopy lid) or optic disc erosion (optic neuritis, and so forth). The apparent impact of IOP lowering in the treatment arm would have been diluted by those who stood no chance of benefiting in the first place. The difference, smaller but still measurable, would not have been statistically significant.
By sharpening the diagnosis in patients in both arms of the trial, the EPC created a far more meaningful comparison. Although the number of subjects with an “end event” more likely to have been caused by true glaucomatous optic nerve damage was smaller, the difference in incidence between the 2 arms was greater and both statistically and clinically significant. Because of the EPC the study outcome was more accurate, and glaucoma management was finally based upon hard, reliable data, not “clinical impression.”
But even this important, carefully thought-through series of diagnostic refinements was a bit “sticky.” Unlike the reading center staff, the 3 “clinical experts” were not, as far as the paper reports, “standardized” or repeatedly tested for the reproducibility of their diagnoses. Lichter long ago demonstrated that 16 world-renowned “glaucoma experts” had difficulty consistently identifying pathologic optic discs from 20 eyes of 10 patients with manifest POAG. OHTS ultimately depended upon each clinician’s experience; and the uniformity with which these 3 observers made the same diagnosis (for which they had access to the patient’s visual field tests and stereo photographs even though in most instances their involvement was triggered by a persistent change detected by a reading center in just 1 of these 2 parameters). When all 3 experts reported a diagnosis of progressive glaucomatous damage, the “case was closed.” These were surely the most certain of all the glaucomatous cases. Study results calculated using only these most definitive of cases might well have shown an even greater risk reduction in the treatment arm than the study reports. The more accurate the diagnosis, the more valid the results, which in this instance would probably have led to a larger clinical (if not statistically significant) impact in the treatment arm.
When all 3 experts were not unanimous on their first independent review, the case was sent back to them for re-review. We are not told how many of these re-reviews were conducted. A number of these cases received a unanimous finding of POAG-related optic nerve degeneration on this second “independent” review, but were the readers aware that this was a second review? If so, did that influence their diagnosis?
Those who did not receive unanimity on the second review were subjected to a third review in which the case was discussed by all 3 experts. Undoubtedly, these EPC reviews helped whittle down the number of additional patients of dubious POAG status, but each review, after the first, was probably less definitive: because the diagnosis was less clear (or it would never have gone to a second or third review) and potentially biased by the knowledge that a second or third review was needed (the potential bias might have gone in either direction).
In OHTS, as in other studies, an EPC (or its equivalent) is essential to obtaining optimal clinical accuracy and therefore the most valid and reliable results. In 1 or more decades, new algorithms, based upon more readily quantifiable data collected from visual field and optical coherence tomography testing, and powered by artificial intelligence and deep learning, will probably do a better job at standardized, reproducible diagnoses than we now do, particularly when it comes to endpoints of interest in clinical trials.
The author has completed and submitted the ICMJE form for Disclosure of Potential Conflicts of Interest and none were reported. Financial Disclosures: This editorial received no funding or financial support. Dr. Sommer has reported he has no financial disclosures or financial conflicts of interest.