Hoffer and associates’ editorial on protocols to analyze IOL formula accuracy highlights how the refractive outcomes of cataract surgery are reported using inappropriate statistical methodology. Their proposal of a standard is welcome but the statement that it is “best to simply compare median absolute errors (between formulae)” because absolute errors are not normally distributed, implying that no statistical analysis is necessary, is incorrect and contrary to widely accepted standards on publishing medical research.
Here, we propose alternative guidance as to appropriate statistical analysis of refractive outcomes. It is assumed that data from the same sample of eyes are used to calculate predicted refractions for each formula, which makes samples “dependent.”
At the start of the analysis, as Hoffer and associates state, mean formula prediction errors must be close to zero to eliminate the systematic error from an incorrect formula constant. The authors suggest an iterative method to recalculate and nullify the mean prediction error. Once the mean error is nullified, the spread of the distribution can be measured using standard deviation or mean absolute errors. The disadvantage of using standard deviation is that it is greatly affected by outliers, as it is related to the square value of the difference of each value from the mean. It is therefore best to use absolute values and the mean or median absolute error, which is affected by outliers to a lesser degree. Absolute errors follow, at best, a folded normal distribution, which itself is not amenable to parametric statistical analysis. An alternative approach is to convert the distribution to a normal distribution using transformations, but this can skew the significance of the data. For example, a cubic root transformation gives the same weight to a change in prediction error from 0.00 diopter (D) to 0.10 D as from 0.10 D to 0.60 D.
The preferred method to statistically analyze absolute errors is nonparametric tests. Samples should be dependent, with the same sample of eyes being used to test each formula. This eliminates the variability between groups if different samples of eyes were used to test each formula. When contrasting 2 formulae, the Wilcoxon signed rank test should be used. For testing 3 or more formulae, the Friedman test is the most appropriate test to compare mean absolute error. This test will test for a statistically significant difference in mean absolute error between groups. If the P value is statistically significant, post hoc analysis can be carried out to find out which group or groups are responsible for the null hypothesis being rejected, to correct for the multiple comparisons made. Software such as SPSS (IBM, Armonk, New York, USA) and R (R Project, R Foundation, Vienna, Austria) have instructions on how to run this post hoc analysis.
Other measures such as the 95% confidence interval around the mean and minimum and maximum errors are related to the sample and are not measures of spread.
If in doubt, authors should seek professional statistical advice on how to go about analyzing their data.