Development of Classification Criteria for the Uveitides


To develop classification criteria for 25 of the most common uveitides.


Machine learning using 5,766 cases of 25 uveitides.


Cases were collected in an informatics-designed preliminary database. Using formal consensus techniques, a final database was constructed of 4,046 cases achieving supermajority agreement on the diagnosis. Cases were analyzed within uveitic class and were split into a training set and a validation set. Machine learning used multinomial logistic regression with lasso regularization on the training set to determine a parsimonious set of criteria for each disease and to minimize misclassification rates. The resulting criteria were evaluated in the validation set. Accuracy of the rules developed to express the machine learning criteria was evaluated by a masked observer in a 10% random sample of cases.


Overall accuracy estimates by uveitic class in the validation set were as follows: anterior uveitides 96.7% (95% confidence interval [CI] 92.4, 98.6); intermediate uveitides 99.3% (95% CI 96.1, 99.9); posterior uveitides 98.0% (95% CI 94.3, 99.3); panuveitides 94.0% (95% CI 89.0, 96.8); and infectious posterior uveitides / panuveitides 93.3% (95% CI 89.1, 96.3). Accuracies of the masked evaluation of the “rules” were anterior uveitides 96.5% (95% CI 91.4, 98.6) intermediate uveitides 98.4% (91.5, 99.7), posterior uveitides 99.2% (95% CI 95.4, 99.9), panuveitides 98.9% (95% CI 94.3, 99.8), and infectious posterior uveitides / panuveitides 98.8% (95% CI 93.4, 99.9).


The classification criteria for these 25 uveitides had high overall accuracy (ie, low misclassification rates) and seemed to perform well enough for use in clinical and translational research.

T he uveitides are a collection of over 30 diseases characterized by intraocular inflammation. They can be organized as a matrix of diseases classified by the uveitic anatomic class and by whether they are (1) infectious, (2) associated with a systemic autoinflammatory or autoimmune disease, or (3) eye-limited and presumed to be immune-mediated ( Table 1 ). The uveitic classes are defined anatomically by the primary site in which inflammation is detected clinically and consist of anterior uveitis (primary site in the anterior chamber), intermediate uveitis (primary site in the vitreous humor), posterior uveitis (primary site in the retina or choroid), and panuveitis (anterior chamber, vitreous humor, and retina/choroid all typically involved without predominance in any one site). Posterior uveitides may primarily involve the retina and typically are infectious or may primarily involve the choroid and/or retinal pigment epithelium and, though often noninfectious, also may be infectious.

Table 1

Uveitic Diseases Addressed by the SUN Developing Classification Criteria for the Uveitides Project

Anatomic Class Infectious a Systemic Disease Associated Eye-Limited
Anterior Cytomegalovirus anterior uveitis Juvenile idiopathic arthritis–associated anterior uveitis Fuchs uveitis syndrome
Herpes simplex virus anterior uveitis Spondyloarthritis/HLA-B27-associated anterior uveitis
Varicella zoster virus anterior uveitis Tubulointerstitial nephritis with uveitis
Syphilitic anterior uveitis Sarcoidosis-associated anterior uveitis
Intermediate Syphilitic intermediate uveitis Multiple sclerosis–associated intermediate uveitis Pars planitis
Sarcoidosis-associated intermediate uveitis Intermediate uveitis, non–pars planitis type
Posterior Acute retinal necrosis Sarcoidosis-associated panuveitis Acute posterior multifocal placoid pigment epitheliopathy
Cytomegalovirus retinitis Birdshot chorioretinitis
Syphilitic posterior uveitis Multiple evanescent white dot syndrome
Toxoplasmic retinitis Multifocal choroiditis with panuveitis
Tuberculous posterior uveitis Punctate inner choroiditis
Serpiginous choroiditis
Panuveitis Syphilitic panuveitis Behçet disease uveitis Sympathetic ophthalmia
Tuberculous panuveitis
Sarcoidosis-associated panuveitis
Vogt-Koyanagi-Harada disease (early-stage and late-stage)

a Infectious uveitides refer to those with evidence of active infection. They do not include autoinflammatory or autoimmune diseases triggered by a prior infection (eg, reactive arthritis-associated uveitis).

Classification criteria are employed to diagnose individual diseases for research purposes. Classification criteria differ from clinical diagnostic criteria in that although both seek to minimize misclassification, when a trade-off is needed, diagnostic criteria emphasize sensitivity, whereas classification criteria emphasize specificity. The goal of classification criteria is to define a homogeneous group of patients for inclusion in research studies and to optimize the likelihood that all participants in the study will be generally accepted to have the disease.

Classification criteria are needed for the field of uveitis. Although diagnostic criteria have been proposed for several diseases, currently there is no validated systematic approach to classifying the uveitides, and currently the agreement among uveitis experts on the diagnosis of a specific case is moderate at best (κ = 0.39). Furthermore, there are pairs of experts for whom the observed level of diagnostic agreement could have occurred by chance alone (κ ∼0.0). As such, there is a lack of uniformity of reporting in the literature and an uncertainty about the comparability of different case series and clinical studies of patients with uveitis. Adoption of generally accepted and widely used classification criteria for reporting the uveitides in the literature should help address the current uncertainty that exists.

The Standardization of Uveitis Nomenclature (SUN) Working Group is an international collaboration dedicated to improving research in the field of uveitis. The “SUN Developing Classification Criteria for the Uveitides” project’s goal was to develop classification criteria for 25 of the most common uveitides using a formal approach to development and classification. ,


The SUN Developing Classification Criteria for the Uveitides project proceeded in 4 phases: (1) informatics, (2) case collection, (3) case selection, and (4) machine learning.


As previously described, the informatics phase was conducted from 2009 to 2010 and developed a standardized vocabulary and set of dimensions for describing uveitic cases and diseases. , It enabled the development of a standardized, menu-driven, hierarchical case report for case collection, which sought to maximize discrete data collection and minimize free text.

Case Collection

In case collection, information on 5,766 cases of 25 of the most common uveitides was collected retrospectively between 2010 and 2016 using the standardized forms developed during the informatics phase. Information was entered into the SUN preliminary database by the 76 contributing investigators. Case information was de-identified, and investigators entered cases retrospectively from existing case records. Investigators were instructed to enter data from the presentation visit or, in the unusual situation where there was disease evolution, the visit at which the diagnosis became known. The target for case collection was 150-250 cases of each of the 25 diseases. Once ∼250 cases were collected, case collection for a specific disease was closed. Because they enter into the differential diagnosis of several classes of the uveitides, more than 250 cases of sarcoidosis-associated uveitis (383 cases) and tuberculous uveitis (358 cases) were collected. Because of their very different features, cases of early-stage Vogt-Koyanagi-Harada disease and late-stage Vogt-Koyanagi-Harada disease were collected separately. Because the goal of the project was to develop criteria to distinguish among the uveitides, only cases with uveitis were entered into the preliminary database.

Investigators were instructed to submit images relevant to the diagnosis (eg, fundus photographs for infectious and noninfectious posterior uveitides and panuveitides, fluorescein angiograms, and optical coherence tomograms, as appropriate) into the database. These images were used by the case selection committees for case selection and graded independently by a Reading Center at the Department of Ophthalmology, University of Wisconsin, Madison School of Medicine and Public Health. Reading Center grades included information on lesion number, location, size, and character, as appropriate. Reading Center data were used preferentially in the machine learning for posterior uveitides and panuveitides (including the infectious subset) for features including lesion (or spot) number, distribution, and size. Image results relevant to criteria were reviewed and discrepancies adjudicated by a clinician (P.M.) dedicated to image management. Actual images were not subject to machine learning.

Case Selection

Because there is no “gold standard” for case definition, because there is modest agreement among uveitis experts, and because there is a need to define a homogeneous group of patients by classification criteria, it was decided to “select” those cases from the preliminary database that achieved a supermajority agreement on the diagnosis as the final database to be used in the machine learning phase. Case selection occurred during 2016 and 2017. Cases in the preliminary database were reviewed by committees of 9 investigators for inclusion into the final database (case “selection”). Committees were geographically and “school of thought” dispersed. Case selection proceeded in 2 steps: online voting followed by consensus conference calls. During online voting committee members reviewed the cases and individually voted on whether or not the data supported the diagnosis based on their clinical judgment, without reference to any specific criteria. A “forced choice” was required on whether the investigator thought that the case should be included or not included in the final database. Cases obtaining a supermajority (>75%) of “yes” votes were included; those with a supermajority of “no” votes were excluded. Those cases with no supermajority yes or no votes were tabled for consensus conference calls. The consensus conference calls were conducted using nominal group techniques, which are a formal consensus approach that minimizes “dominant personality” effects. A round of formal uninterrupted individual comments was followed by anonymous voting, with supermajority requirements for acceptance or rejection. If the case was neither accepted nor rejected after the first round, a second round was conducted. If the case was neither accepted nor rejected after the second round, it was permanently tabled and not included in the final database. Five committees based on uveitic class worked in parallel, with infectious posterior uveitis and panuveitis as a separate committee. The core committee membership was the same for all of the diseases within a class, but there was some variability between specific diseases in the committee membership based on investigator availability.

Machine Learning

Machine learning was conducted during 2018 and 2019. The final database then was randomly separated into a training set (∼85% of the cases) and a validation set (∼15% of the cases) for each uveitic class. Data from “check all that apply” questions in the database were converted to a series of binary “yes/no” or “present/absent” items. Because of the retrospective nature of data collection and the selective Bayesian approach to testing in clinical care now advocated (in which tests are selected to rule in or rule out a diagnosis, rather than a standard set of tests used on all cases), not all laboratory data were available on each case. Therefore, an “evidence for” approach was adopted in which data supporting the diagnosis were needed to make the diagnosis and missing data were treated as negative data. This approach mimics clinical care in which it is presumed that tests not performed would be negative or irrelevant if they had been performed. However, relatively more complete data were available for the 2 typically exclusionary diseases that can present clinically in any of the uveitic classes, syphilis and sarcoidosis.

Because the uveitic disease diagnosis is a patient diagnosis, eye-specific information was coalesced into patient-specific information, typically representing the “worse eye.” If the feature was present in either eye, it was treated as present for the individual, and if there were multiple options for a feature (eg, predominant lesion size), it was taken as the larger of the 2 ranks.

Machine learning was used on the training set to determine criteria that minimized misclassification. Because diagnostic confusion typically is within class and not between anatomic classes of the uveitis, machine learning was performed separately within class for 5 groups of diseases: anterior uveitides, intermediate uveitides, posterior uveitides, panuveitides, and infectious posterior uveitides or panuveitides. Cases from subsets of diseases that crossed class (eg, syphilitic uveitis, sarcoidosis-associated uveitis, and tuberculous uveitis) were included in the relevant class. Because of the low ratio of cases to diagnoses, it was elected to sequester ∼150 cases into each of the 5 validation sets, which would provide a point-wise confidence interval no greater than ±0.08 when expressing accuracy as the fraction correct in the validation set. Four classification methods were considered, listed with their tuning parameters and R package name (in parentheses): classification and regression trees (CART), with cost-complexity pruning and cp = 0.01 (rpart); random forests (RF) with default tuning parameters (randomForest); multinomial logistic regression with lasso regularization and the 1 standard error (se) value chosen for lambda (gimnet); and support vector machines (SVM) with radial kernel and tuning performed on a grid of cost and gamma values (e1071). The classification methods were compared with respect to accuracy and confusion matrices, Obuchowski’s index, Van Calster’s polytomous discrimination index, , and discrimination plots. For the polytomous discrimination index, currently available packages could not handle the 9-level categorical variables required, and a new algorithm was developed (Oden N, personal communication; R code available upon request).

To strive for parsimony in the feature set for each disease and avoid overfitting, an approach based on the Boruta algorithm was used (R package Boruta). Boruta is an all relevant feature wrapper algorithm that uses random forests by default and compares importance of attributes with shadow attributes, created in each iteration by shuffling original ones. Attributes that have significantly worse importance than shadow ones are consecutively dropped, while attributes that are significantly better than shadows are admitted to be confirmed. Candidate features of a given uveitic class were arranged in order of descending Boruta importance. Then each candidate classification method was asked to construct classifications 1, 2, 3, etc, where classification 1 is based only on the most important feature, classification 2 is based on the 2 most important features, etc. Graphs were constructed showing accuracy and unweighted kappa for the methods vs increasing number of included features, based on 5-fold cross-validation, and used to choose a final set of features that would generate a classification that was both parsimonious and accurate.

Multinomial logistic regression, RF, and SVM all provided similar results, but CART provided slightly worse performance (data not shown). Multinomial logistic regression with lasso regularization was chosen. This approach typically presents classification rules as linear combinations of features, which were restated as equivalent Boolean classification rules. This approach was possible because all SUN features were treated as categorical, and the few continuous features (eg, age, intraocular pressure) were stratified as categorical variables. Thus once a logistic model is constructed for a uveitic class, the model can be asked to predict its outcome for every unique combination of final features in the training set. The resulting metadata were submitted to the Quine-McCluskey algorithm as extended by Dusa and Thiem, and implemented as the eQMC function in the R QCApro package. This algorithm constructs a minimal set of Boolean expressions, 1 for each different type of predicted output (uveitic disease) in the class. The collection of Boolean expressions is a set of classification rules that exactly re-creates the decisions of the logistic regression in the training set.

To optimize performance of the criteria, an iterative approach was taken to feature engineering using the learning set in which clinically relevant “OR” variables were combined as a single “evidence of” variable, such as combining chest radiographic results with chest computed tomography results to produce a variable identifying bilateral hilar adenopathy on chest imaging (ie, chest radiography or computed chest tomography), then combining this variable with a tissue biopsy demonstrating noncaseating granulomata to produce an “evidence of sarcoidosis” variable. All such “OR” variable creation was performed only on the learning set and without reference to the diagnosis, and the performance of those variables selected for the final model was evaluated in the validation set. When the Quine-McCluskey algorithm produced more than 1 equivalent set of criteria, the set that best fit with the other methods and with clinical care was chosen.

After criteria for each disease were developed using the training set, they were evaluated on the validation set, and the misclassification rate was calculated for both the learning and the validation sets. The misclassification rate was the proportion of cases classified incorrectly by the machine learning algorithm when compared to the consensus diagnosis. As a check on the accuracy measure, the balanced accuracy (which is unaffected by the relative numbers of cases of each diagnosis in the set) also was calculated on the validation set.

The final classification rules, which were expressed formally in terms of Boolean expressions involving variables in the training and test sets, were restated in English as the criteria (“final rules”) for each individual disease. To test the accuracy of the criteria, an ∼10% sample of each disease in the final database was randomly selected, and the original case data (without engineered variables) were evaluated within uveitic class by a single observer masked as to diagnosis (J.E.T.), so as to estimate the class accuracy of the final rules. The masked observer’s results were compared to the machine learning results (to determine how well they reflected the conversions of the machine learning variables to “rules” expressed in English) and to the consensus diagnoses (to determine how well they performed).

The final classification rules and the disease-specific manuscripts presenting them were subject to multiple levels of review and approval, including the individual manuscripts’ writing committees, the Executive Committee, the Steering Committee, and the SUN Working Group. The SUN Working Group held a meeting in December 2019 to review the manuscripts and criteria, resulting in over 80 separate suggestions for the criteria and manuscripts. Additional analyses and sensitivity analyses suggested by that meeting were conducted in the first quarter of 2020, leading to additional revisions.


Of the 5,766 cases collected, 4,046 (70%) were selected in the case selection phase and used in the machine learning phase. Supermajority agreement that a case should be included or excluded was achieved on 99% of cases overall (ie, only 1% were tabled owing to failure to reach agreement). The numbers of cases selected and the region of origin by uveitic class are listed in Table 2 . Cases of sarcoidosis-associated, syphilitic, and tubercular uveitis were analyzed within relevant uveitic class (eg, sarcoid anterior uveitis with the anterior uveitides, sarcoid intermediate uveitis with the intermediate uveitides, etc), and when cases were in the differential diagnosis both of noninfectious uveitides and panuveitides and of infectious posterior uveitides or panuveitides, they were used in both sets for machine learning. Because of this use of some cases in more than 1 class and the use of subsets of sarcoidosis, syphilis, and tuberculosis in different classes, the numbers of cases used in the machine learning phase were as follows: anterior uveitides 1,083; intermediate uveitides 589; posterior uveitides 1,068; panuveitides 1,012; and infectious posterior and panuveitides 803.

Nov 5, 2021 | Posted by in OPHTHALMOLOGY | Comments Off on Development of Classification Criteria for the Uveitides

Full access? Get Clinical Tree

Get Clinical Tree app for offline access