Purpose
To analyze the outcome of machine learning technique for prediction of small incision lenticule extraction (SMILE) nomogram.
Design
Prospective, comparative clinical study.
Methods
A comparative study was conducted on the outcomes of SMILE surgery between surgeon group (nomogram set by surgeon) and machine learning group (nomogram predicted by machine learning model). The machine learning model was trained by 865 ideal cases (spherical equivalent [SE] within ±0.5 diopter [D] 3 months postoperatively) from an experienced surgeon. The visual outcomes of both groups were compared for safety, efficacy, predictability, and SE correction.
Results
There was no statistically significant difference between the baseline data in both groups. The efficacy index in the machine learning group (1.48 ± 1.08) was significantly higher than in the surgeon group (1.3 ± 0.27) (t = -2.17, P < .05). Eighty-three percent of eyes in the surgeon group and 93% of eyes in the machine learning group were within ±0.50 D, while 98% of eyes in the surgeon group and 96% of eyes in the machine learning group were within ±1.00 D. The error of SE correction was -0.09 ± 0.024 and -0.23 ± 0.021 for machine learning and surgeon groups, respectively.
Conclusions
The machine learning technique performed as well as surgeon in safety, but significantly better than surgeon in efficacy. As for predictability, the machine learning technique was comparable to surgeon, although less predictable for high myopia and astigmatism.
In recent years, small incision lenticule extraction (SMILE) has been established as a safe and effective procedure for correction of refractive errors. , However, over- and under-correction are reported in the postoperative period. , Reinstein and associates and Yao and associates respectively reported that only 84% and 67.6% of the eyes recover to ±0.50 diopter (D) (mean spherical equivalent refraction [SE]) at 3 months postoperatively. Previous studies have described nomograms for adjustment of SE and astigmatism for LASIK. , Liyanage and associates employed multiple regression modeling to enhance nomogram accuracy of wavefront LASIK. Unlike LASIK, SMILE nomogram adjustment mainly depends on surgeons’ personal experience. Fortunately, given the adequate number of training samples (treatment cases), artificial intelligence can potentially be used to design “intelligent” models to create and improve surgical nomograms. Recently, machine learning techniques have been widely applied for prediction of age-related macular degeneration prediction, classification of glaucoma, , and detection of keratoconus. In this study, we adopted the Multilayer Perceptron algorithm (machine learning technique) to train nomogram models for SMILE, and compared the outcomes with the surgeon’s nomogram.
Methods
This was a prospective, comparative study. This study protocol was approved by the Institutional Review Board of Tianjin Eye Hospital, Tianjin, China (TJYYLL-2017-15) and registered at the Chinese Clinical Trial Register (ChiCTR1900024875). The study protocol adhered to the tenets of the Declaration of Helsinki and was approved by the Ethics Committee of Tianjin Eye Hospital. All patients signed an informed consent before participation. Patients were recruited from December 2017 to March 2018. The inclusion criteria were as follows: (1) age 18-45 years; (2) spherical myopia up to -9.00 D and myopic astigmatism up to -3.00 D; (3) corrected distance visual acuity (CDVA) of 20/40 or better; (4) postoperative residual stromal bed thickness >250 μm; (5) stable refraction for more than 2 years. The exclusion criteria were as follows: (1) corneal disease; (2) ocular trauma; (3) suspicion of keratoconus on corneal topography. Patients were required to stop wearing soft contact lenses for at least 2 weeks and rigid contact lenses for at least 4 weeks before examination.
The study enrolled 1146 eye samples from 573 patients from March 2017 to July 2017, from which 865 eye samples with ideal postoperative results (SE recovering to ±0.50 D at 3 months postoperatively) were selected as training sets. Another 600 eye samples from 300 patients were recruited from December 2017 to March 2018 and were evenly divided into 2 groups for validation of results. The nomogram was set up by an experienced surgeon for the first group (defined as the surgeon group) and predicted by the machine learning model for the second group (defined as the machine learning group).
Machine Learning Model
We selected 865 eye samples with ideal postoperative results (SE recovering to ±0.50 D at 3 months postoperatively) from 1146 SMILE refractive surgical records in Tianjin Eye Hospital. From those samples, the nominal features were transformed into binary ones, and the numeric features were normalized into range [0, 1]. The critical features affecting nomogram values, including age, eye (left/right), uncorrected visual acuity, SE, sphere, cylinder, astigmatic axis, corneal radius, optical zone, K1, K2, average corneal curvature, central corneal thickness, and residual stromal thickness, were resolved according to information gain analysis. Subsequently, a double hidden layer artificial neural network (ANN) from Weka 3.8 (University of Waikato,Hamilton, New Zealand) (with parameters set as default) was adopted to build a prediction model. The ANN model was structured with a 14-node input layer (5 nodes for the first hidden layer and 8 for the second one), and a 1-node output layer delivering predictive nomogram values. A boosting strategy was adapted to further optimize the prediction model in such a way that the inaccurately predicted samples were assigned more weight in the next training round. As the minimum unit for a nomogram is usually set as 0.05 in SMILE surgery, the predicted continuous nomogram value (OriNomogram) was transformed into a final output nomogram by the following formula:
Nomogram=[OriNomogram/0.05]∗0.05+[(OriNomogram−[OriNomogram/0.05]∗0.05)/0.025]∗0.05
To evaluate the performance of our model, we kept 5% of the 865 eye samples (43 samples) in the validation set. The remaining samples were used in the training set. The accuracy of our prediction model reached 97.67% with 0.05 deviations in the nomogram. To provide a more unbiased estimation, we also conducted 10-fold cross-validation, in which all 865 eye samples were evenly divided into 10 groups. Of these, 9 groups were used as training sets and 1 group was used as the validation set, during each of the 10 evaluation rounds. The final average accuracy of 10-fold cross-validation reached 95.14%.
Refractive Surgery
The prediction model was applied to set the nomogram for patients in the machine learning group, while an experienced surgeon (W.Y.) set the nomogram for patients in the surgeon group. The nomogram was set to the spherical correction only. Following this, patients in both groups underwent SMILE procedures. All surgeries were performed using the VisuMax femtosecond laser (Carl Zeiss Meditec AG, Jena, Germany) system with 500-kHz repetition rate. The surgery parameters were set as follows: (1) 7.3 to 8 mm cap diameter; (2) 110 or 120 μm cap thickness; (3) 90-degree side angle; (4) 6.0 to 7.8 mm lenticule diameter (optical zone); (5) 3.0 to 4.0 mm lenticule incision at 12 o’clock. The target postoperative refraction was emmetropia for all patients.
Outcome Measures
The outcome measures of SMILE surgery included safety, efficacy, predictability, and the error of SE correction, which were defined as follows:
Safety is the percentage of eyes losing 2 or more lines of CDVA, while safety index is the ratio of mean postoperative CDVA to mean preoperative CDVA. The cut-off value of the safety index was set to 1.0.
Efficacy is the percentage of eyes with uncorrected distance visual acuity (UDVA) of 20/20 and 20/40, while efficacy index is the ratio of mean postoperative UDVA to mean preoperative CDVA. The cut-off value of the efficacy index was set to 0.8.
Predictability is the percentage of eye samples within ±0.50 D and ±1.00 D of desired postoperative refractive error.
The error of SE correction is the difference between the target and achieved postoperative refraction.
Statistical Analysis
To compare the difference between surgeon and machine learning groups, we used SPSS 23.0 (IBM Corp, Armonk, New York, USA), taking independent-sample t test and Mann-Whitney test for normal and abnormal data distribution, respectively. P values less than .05 were considered to be statistically significant.
Results
Of the 600 eye samples from 300 patients, 44 eye samples were lost from the surgeon group and 40 eye samples were lost from the machine learning group. Two and 4 eye samples with monocular surgery were removed from the surgeon group and machine learning group, respectively. A total of 254 eye samples from the surgeon group and 256 eye samples from the machine learning group were used for evaluation.
Table 1 shows the statistical analysis of eye samples from each group. There was no statistically significant difference between the 2 groups for any of the baseline parameters ( P > .05).
Parameter | Surgeon Group (n=254) | Machine Group (n=256) | P Value | ||
---|---|---|---|---|---|
Mean ± SD | Range | Mean ± SD | Range | ||
Age(y) | 25.61 ± 5.83 | 18 to 42 | 25.01 ± 5.75 | 18 to 42 | .773 |
LogMAR UDVA | 1.10 ± 0.36 | 0.15 to 2.00 | 1.13 ± 0.28 | 0.22 to 2.00 | .333 |
Sphere (D) | −4.96 ± 1.57 | −1.25 to −9.00 | −5.23 ± 1.68 | −1.00 to −9.00 | .113 |
Cylinder (D) | −0.70 ± 0.63 | 0.00 to −3.50 | −0.68 ± 0.58 | 0 to −3.25 | .376 |
SE (D) | −5.31 ± 1.60 | −1.75 to −9.88 | −5.57 ± 1.74 | −1.36 to −9.38 | .136 |
LogMAR CDVA | −0.00 ± 0.26 | −0.08 to 0.10 | −0.00 ± 0.21 | −0.08 to 0.10 | .903 |
Km (D) | 43.47 ± 1.32 | 40.6 to 46.9 | 43.35 ± 1.23 | 40.27 to 46.20 | .247 |
CCT (μm) | 549.20 ± 26.93 | 482 to 633 | 548.18 ± 26.99 | 489 to 614 | .678 |