To investigate a certain set of methodological limitations in published anti–vascular endothelial growth factor (VEGF) randomized controlled trials (RCTs).
Descriptive methodological study.
We did a PubMed search with the terms “bevacizumab OR ranibizumab OR pegaptanib OR aflibercept” and the limitations “humans” and “randomized controlled trials” in 15 of the highest-impact-factor general medicine and ophthalmology journals. We included only RCTs published as original articles, where an anti-VEGF agent was used to treat eye disease. Two independent observers (O.A., P.K.) read through each article and classified the articles according to a predefined set of criteria.
The PubMed search yielded 209 articles, and 93 were classified as eligible. In most of the studies, the study drug was bevacizumab (52.6%, n = 49), followed by ranibizumab (44.1%, n = 41), pegaptanib (7.5%, n = 7), and aflibercept (5.4%, n = 5). Basic epidemiologic data, including sex distribution (2.2%, n = 2) and mean age (3.2%, n = 3), were missing in 3.2% of the published RCTs. The power calculation for efficacy was mentioned in 48% (n = 45) of the published work, and a power calculation for safety was considered in only 1 study (1.1%). Only 6 RCTs (6.5%) reported negative results.
Power calculations for efficacy, an important component of an RCT, were missing in 51% of the RCTs we surveyed, while a power calculation for safety was only present in 1.1%. Around 60% of the published RCTs were labeled as an “efficacy and safety trial,” and none of those studies had a power calculation for safety.
Vascular endothelial growth factor (VEGF) inhibitors have revolutionized the treatment of many retinal diseases, including age-related macular degeneration, diabetic macular edema, and retinal vein occlusion. These agents promised to fulfill a previously unmet need in ophthalmology and therefore found widespread use for these and many other ophthalmologic conditions. Many randomized controlled trials (RCTs) were initially performed to demonstrate the efficacy of these agents and reported positive results. However, safety concerns surfaced, and an effort to evaluate the safety of these agents subsequently emerged.
RCTs and meta-analyses are widely considered the highest level of evidence. However, these “more elite” studies are not immune from error. A recent study by Ebrahim and associates reported that 35% of published reanalyses of RCTs by entirely independent authors led to different conclusions from those in the original articles. There are many factors that may contribute to the misinterpretation of RCT results. Therefore, as in any clinical study, careful reading and statistical evaluation are required to evaluate the results of such work.
Statistical analyses in clinical studies fundamentally compare the probability of 2 conditions. The first condition is a type 1 error (α), or a false-positive result. The probability of a type 1 error is represented by the P value and usually is accepted as statistically significant if the probability of a type 1 error is less than 5% (corresponding to P < .05). The second condition is a type 2 error (β), or a false-negative result. The statistical power is calculated by subtracting the probability of a type 2 error from 1 (1 − β), which represents the probability of correctly rejecting the null hypothesis. A probability of a type 2 error below 20%, or a statistical power above 0.8, is empirically accepted as the lower limit of acceptable statistical power by many scientists. When designing a clinical study, a sample size calculation is needed to ensure the recruitment of an adequately sized study population. When the sample size is too small, the probability of a type 2 error is high. On the other hand, when the sample size is too large, it may cause ethical concerns (regarding the unnecessary recruitment of study participants), reduced cost effectiveness, and clinically insignificant differences to appear statistically significant.
We observed that many anti-VEGF RCTs assessing efficacy failed to report statistical power calculations. Most of these studies were labeled “efficacy and safety” trials, though they were originally designed only to evaluate the efficacy of a treatment. As a result, most of these trials, where the authors claim to assess safety simultaneously, did not have the necessary power to assess safety, and the probability of a type 2 error could not be excluded. In this study, we aimed to formally assess power analyses and associated basic methodological issues in published anti-VEGF RCTs.
A PubMed search was conducted using the search terms “bevacizumab OR ranibizumab OR pegaptanib OR aflibercept” and the limitations “humans” and “randomized controlled trials” in 15 of the highest-impact-factor general medicine and ophthalmology journals on December 9, 2013. The 2012 ISI Web of Knowledge (Thomson Reuters, London, United Kingdom) impact factor rankings were used to identify journals for inclusion. General medicine journals included in the PubMed search were as follows: The New England Journal of Medicine , The Lancet , Journal of the American Medical Association (JAMA) , The British Medical Journal , PLOS Medicine , Annals of Internal Medicine , Archives of Internal Medicine , BMC Medicine , The Canadian Medical Association Journal , The Journal of Internal Medicine , Mayo Clinic Proceedings , Cochrane Database Systematic Reviews , Annals of Medicine , The American Journal of Medicine , and Annals of Family Medicine . Ophthalmology journals included Progress in Retinal and Eye Research , Ophthalmology , Archives of Ophthalmology , The American Journal of Ophthalmology , Investigative Ophthalmology & Visual Science , Experimental Eye Research , Survey of Ophthalmology , Retina , British Journal of Ophthalmology , The Ocular Surface , Current Opinion in Ophthalmology , The Journal of Cataract and Refractive Surgery , The Journal of Vision , The Journal of Refractive Surgery , and Acta Ophthalmologica . Among identified articles, only RCTs published as original articles evaluating the use of anti-VEGF to treat eye disease were included. Meta-analyses, reviews, letters, brief reports, extension studies, and secondary analyses of previously published data were excluded. Included journals and the number of articles identified are listed in Table 1 .
|Journals||Number of Eligible Articles a||Presence of Power Calculation for Efficacy, N (%)||Presence of Power Calculation for Safety, N (%)|
|General medicine journals|
|The New England Journal of Medicine||5||4/5 (80)||1/5 (25)|
|Ophthalmology||30||17/30 (56.7)||0/30 (0)|
|Retina—Journal of Retinal and Vitreous Diseases||18||7/18 (38.8)||0/18 (0)|
|The British Journal of Ophthalmology||15||6/15 (40.0)||0/15 (0)|
|The American Journal of Ophthalmology||11||6/11 (54.5)||0/11 (0)|
|Acta Ophthalmologica||7||2/7 (28.6)||0/7 (0)|
|Archives of Ophthalmology||6||3/6 (50.0)||0/6 (0)|
|The Journal of Cataract and Refractive Surgery||1||0/1 (0)||0/1 (0)|
a Fifteen highest-impact-factor general medicine and ophthalmology journals were searched with the terms “bevacizumab OR ranibizumab OR pegaptanib OR aflibercept” and the limitations “humans” and “randomized controlled trials.” Among those searched, only those listed here included articles that met the inclusion criteria (a randomized controlled trial published as an original study).
Two independent observers (O.A., P.K.) reviewed each article and classified each study according to the anti-VEGF agent(s) used, disease studied, route of administration, presence of sex data, presence of age data, power calculation for efficacy, power calculation for safety, and presence of negative results. The title, abstract, and introduction sections were specifically analyzed to determine whether the authors identified their work as an efficacy trial, safety trial, or efficacy and safety trial. Ophthalmology journals were further classified as higher impact factor (ranking 1–7) and lower impact factor (ranking 8–15). Power calculation reporting between higher-impact-factor and lower-impact-factor journals was analyzed using a Pearson χ 2 test.
The PubMed search yielded 209 articles, with 93 articles meeting inclusion criteria ( Table 1 ). Disagreements among observers regarding study classifications were resolved by discussion (4.6% of the cases). When an agreement could not be reached, a third author (F.E.) determined the final classification (0.8% of the cases). (Individual data points of the observers before the resolution of disagreements are presented in Supplemental Table 1 ; Supplemental Material available at AJO.com ). The final classifications of the included articles are available in Supplemental Table 2 (Supplemental Material available at AJO.com ). Publication dates for included studies ranged from 2004 to 2013. Of such studies, the most frequently targeted diseases for anti-VEGF treatment were age-related macular degeneration (40.8%, n = 38), diabetic retinopathy (30.1%, n = 28), retinal vein occlusion (n = 13, 13.9%), and other diseases (n = 13, 13.9%). In most of the studies, bevacizumab (52.6%, n = 49) was used, followed by ranibizumab (44.1%, n = 41), pegaptanib (7.5%, n = 7), and aflibercept (5.4%, n = 5). The most commonly studied route of administration was intravitreal (96.7%, n = 90), followed by subconjunctival (2.2%, n = 2) and systemic use (1.1%, n = 1).
Basic epidemiologic data, including sex distribution (2.2%, n = 2) and mean age (3.2%, n = 3), were missing in 3.2% of the published RCTs. The power calculation for efficacy was mentioned in 48% (n = 45) of all published work, and a power calculation for safety was considered in only 1 study (1.1%). Only 6 RCTs (6.5%) reported negative results. More than half the articles (60.2%) characterized their study as both a safety and efficacy trial, while 38.7% of articles described their study as only an efficacy trial. Only 1 study (1.1%) was described exclusively as a safety trial ( Table 2 ). This particular study was the only one originally designed to assess the safety of a treatment. Interestingly, this study mentioned that the trial was underpowered to evaluate efficacy, but it did not report a power calculation for safety. None of the 57 studies identified as both efficacy and safety trials had a power calculation for safety, while 1 efficacy trial (BEAT-ROP) reported a power calculation for safety to highlight that their study was not designed for safety evaluation and was underpowered to evaluate safety. All of the trials where noninferiority was studied had a power calculation for efficacy. However, around half of the superiority trials had a power calculation for efficacy, and one-third of the dose-response studies had a power calculation for safety. None of the articles mentioned a power calculation for safety, except for a single superiority trial ( Table 3 ). Although there was a trend for a higher reporting of a power calculation for efficacy in the higher-impact-factor journal group, the difference was not statistically significant ( P = .078) ( Table 3 ). The frequencies for reporting the power calculation for efficacy and/or safety in the included journals are also available in Table 1 .
|Number of Studies, N (%)||Power Calculation for Efficacy, N (%)||Power Calculation for Safety, N (%)|
|Efficacy trial||36 (38.7)||15 (41.7)||1 (2.8)|
|Efficacy and safety trial||56 (60.2)||30 (53.6)||0 (0)|
|Safety trial||1 (1.1)||0 (0)||0 (0)|
|Type of study b|
|Superiority trial||60 (64.5)||26 (43.3)||1 (1.7)|
|Noninferiority trial||1 (1.1)||1 (100)||0 (0)|
|Dose-response trial||15 (16.1)||5 (33.3)||0 (0)|
|Dose-response and superiority trial||15 (16.1)||11 (73.3)||0 (0)|
|Dose-response and noninferiority trial||2 (2.2)||2 (100)||0 (0)|
|Total||93 (100)||45 (48.3)||1 (1.1)|