Many studies have been published assessing the accuracy of intraocular lens (IOL) power calculation. Since the formation of the IOL Power Club in 2005, errors have been noted in the protocols used in these studies of accuracy in the peer-reviewed literature. These errors have been seen in articles in most all of our most respected journals. Unfortunately, no methodology standards for authors for these studies have been published since 1981. Many discussions were held along with statistical consultation to agree on a set of protocols. In an attempt to aid authors, 10 recommendations are offered to make a study statistically valid and completely fair in evaluating the accuracy of tested formulas, methods, and instruments.

Firstly, the demographics of the study population (ie, sex, age, and ethnicity) should be clearly described at the beginning of the Methods section. These may well have a relevant influence on eye biometric parameters and, therefore, IOL power calculation performance. Optimization through IOL-specific lens constants may depend on these variables as well.

More importantly, before comparing the results of the formulas, the mean error (ME) of the study group for each formula should be made to equal zero by changing the lens factor (constant) individually for each formula. This eliminates the bias of the lens factor chosen and is the only proper way to do this so that all the formulas are the same. This can easily be done using the Excel software’s Data/What If Analysis/Goal Seek function. There are other ways to do this if you have the dataset in a database and are able to do stepwise iterations through the lens constant values. This may be more difficult with the Haigis formula (see below). It is then appropriate to compare the median absolute error (MedAE) of each formula. The ME being the lowest merely means the lens factor chosen for that particular formula was more appropriate than the others for that group of eyes. Comparing MEs says nothing about formula accuracy but only about the lens factors used by the authors; that is, if the ME is different from zero, a lens factor either too high or too low for that patient group was used.

After the MEs have been zeroed out, it has been common practice to compare the formulas by converting all the negative errors to positive and reporting the mean absolute errors (MAE). The problem is that absolute errors are not a normal Gaussian distribution. Therefore it is best to simply compare MedAEs. The other measures that are useful to report are standard error, minimum and maximum (range of errors), and 95% confidence intervals around the mean, as well as the percentage of eyes with prediction error ≤±0.50 diopter (D), ≤±1.00 D, and >2.00 D, as recommended in 1981.

Because of the compounding (correlation) of data with bilateral eyes, it is best to include only 1 eye from each study subject. There are statistical methods called “generalized estimating equations” (GEE) that one could use with valid results. However, in general, the fewer statistical adjustments performed, the better. When analyzing datasets that include subjects with 1 eye and others with 2 eyes, resampling techniques such as the Bootstrap or GEE may be used. The Bootstrap maintains the correlational structure of the data, whereas GEE estimates the correlation and adjusts for it in the analysis. After using either of them, standard statistical tests such as *t *tests, regressions, and the like may be performed. It is easier using just 1 eye.

Since they are error prone, it is recommended not to include outdated regression formulas such as the SRK I and SRK II in modern studies of formulas. This was reconfirmed by one of the SRK authors, John Retzlaff (public statement at our Annual Scientific Session, October 9, 2014). It should be stated how the formulas were programmed. If they were self-programmed in a spreadsheet, they should be checked against licensed commercial software (biometer or program) and this should be clearly stated. It is totally unfair to a formula to perform a study using a formula inadvertently programmed incorrectly. All formulas used in the study should be referenced, including the crucial errata of the Hoffer Q and the SRK/T formulas (noted below), and be listed alphabetically unless there is a stated reason to do otherwise.

The Haigis formula needs special attention in that it does not depend upon just 1 lens constant but rather on 3: a _{0 }, a _{1 }, and a _{2 }. If all 3 are not optimized, the formula results will not be as accurate as the formula can produce. Optimized constants can be derived from a double linear regression analysis, as described in the literature. Some biometers offer triple optimization; however, not all of them are licensed. If not enough clinical datasets are available for individual optimization, apply the constants from the User Group for Laser Interference Biometry online table (available at www.augenklinik.uni-wuerzburg.de/ulib/c1.htm ; accessed May 26, 2015).

For biometry measurements the use of optical biometers for the best precision of measurements is highly recommended. In very dense cataracts it is still necessary to do an immersion ultrasound to obtain axial length (AL). Contact applanation ultrasound is not optimal because of potential corneal compression and shorter AL and anterior chamber depth (ACD) measurement. Whenever corneal power is used, the method the instrument uses should be clearly stated (eg, area of analysis, number of analyzed points, formula used, and keratometer index). If measurement of ACD is part of an IOL power study it should be defined as the axial distance from the epithelium of the cornea to the lens. The distance from the endothelium to the lens (the original anatomic definition of ACD) should be referred to as the aqueous depth (AQD). Both measurement definitions are used as ACD without defining them, which leads to confusion. The software version for each instrument used should be noted so the study can be repeated.

Some authors refer to the “target refraction” in reference to prediction error (PE). The target refraction desired by the surgeon is of no consequence when evaluating the PE of IOL power calculation. If you target for −5.00 D and the formula recommends a 28.0 D IOL predicting a postoperative (PO) refraction of −5.25 and the refraction results in −5.88 D, then the PE is +0.63 D [(−5.25) – (−5.88)], not +0.88 D (from target). The only comparison should be the difference between the refractive error predicted by the formula (not the surgeon’s target) and the actual stable postoperative refraction of the patient.

Postoperative subjective manifest refraction should be measured ideally at 3 months but at least 1 month after surgery, when the refraction is considered stable. When reporting the accuracy of a method or an instrument to calculate the IOL power, it is preferable to use only 1 IOL model. Different IOL models require different optimized constants. Since refraction accuracy decreases with visual acuity, eyes with best-corrected vision less than 20/40 should be excluded.

The references for some examples of studies compromised by (1) ME not zeroed out, (2) comparison of target refraction results, (3) comparing MAEs, (4) using bilateral eyes, and (5) using multiple IOLs are listed.

It is hoped that these steps will help make future IOL power calculation studies valid, as the authors intend.