As early as the fourth century BC, the importance of the concept of quality of life (QoL) has been debated by philosophers including Socrates, Aristotle, and Royce. The interest in QoL has grown exponentially in the last century and a recent Medline search using the term “quality of life” generated over 11 500 related articles. To date, “QoL” has become de rigueur in ophthalmic outcomes research in order to complement traditional and objective outcomes such as the measurement of visual acuity and intraocular pressure (IOP). The patient’s perspective is important to fully understand the impact of ocular conditions and treatment options. From a clinician’s point of view, for example, the use of eye drops may be effective in managing IOP, but it may well result in side effects with impact on the patient’s QoL when the patient is queried.
The need for patient-reported information has produced a plethora of questionnaires (commonly called instruments) that assess psychometric constructs. This has, however, resulted in a confusing choice for the would-be investigator. Very often a construct such as “ vision functioning ” is confused with “ vision-specific QoL. ” Vision functioning , commonly assessed using the Visual Functioning Index-14 (VF-14), measures visual disability or vision-related activity limitation associated with vision-dependent tasks such as reading, driving, and shopping. V ision-related QoL is a complex trait that encompasses vision functioning , symptoms, emotional well-being, social relationships, concerns, and convenience as they are affected by vision. Vision-related QoL instruments are therefore better able to provide a complete assessment of the impact of ocular conditions and the effectiveness of treatment on other critical components of QoL.
Currently, traditional psychometric methods remain the preferred techniques used in patient-based outcome measures in ophthalmology. Examinations of scale reliability, validity, and responsiveness are desirable, although few studies in ophthalmology have reported all these properties. Reliability does not guarantee validity, and it is important for studies to assess both psychometric properties because examinations of single properties are of limited value. However, traditional methods of psychometric testing are often suboptimal. For example, Cronbach’s alpha is almost universally used as a measure of “internal consistency,” that is, the extent to which scale items of a test measure the same latent variable. However, using Cronbach’s alpha alone is problematic because its value is not independent of the number of items and can be artificially elevated by including numerous items even if they are redundant. In addition, traditional methods conduct their analyses on “raw” item and total scores. However, raw scores are ordered counts, not interval measures, and raw score differences are unequal and nonlinear. Traditional methods also are limited in assessing: 1) how well the items measure a single trait; 2) how well each item “fits” the trait being assessed; 3) how well the items match with the respondents; 4) how well response categories are consistently being selected by respondents; and 5) whether bias exists for an item among subgroups (ie, gender, age groups, location) in the sample.
Modern psychometric methodology offers a unified framework to address these limitations. The most widely used modern psychometric technique is Rasch analysis. Rasch analysis techniques attempt to transform ordinal scores that are scale-dependent into interval measures that are scale-independent and suitable for individual patient assessment. Rasch analysis is based on a logical assumption, namely that individuals with high levels of the trait being measured (eg, vision-specific QoL) should have a greater likelihood of getting a better score on any item (eg, watching television) relative to people with low levels. If this assumption systematically prevails, person estimates, as an interval level variable, can then be used in statistical analysis. In addition, Rasch analysis also provides greater insight into the psychometric properties of the instrument compared to traditional methods. Several techniques are available to determine how well items fit the latent trait being measured, how well the items discriminate between the respondents, and how well item difficulty targets person ability. By employing Rasch analysis, for example, the Impact of Vision Impairment scale (IVI) has provided valuable information in several areas beyond visual acuity and functioning, such as vision-specific emotional well-being and psychosocial parameters.
However, Rasch analysis will not transform poorly developed scales into valid ones. Rather, it complements and strengthens rigorously applied traditional psychometric methods. There is also a need to “demystify” the application of Rasch analysis in outcomes research in ophthalmology. Current Rasch analysis methods are software-driven and often perceived as complex, which can lead to confusion and misunderstanding. It is therefore imperative that the application and outcome of Rasch analysis are explained in a language understandable to clinicians and that the advantages of Rasch analysis over raw score analyses are empirically demonstrated. Finally, outside of Rasch analysis, other modern validation methods such as Item Response Theory (IRT) are also available. As opposed to Rasch analysis, where the data must fit the Rasch model to generate stable linear measures, IRT models were developed to fit the data. Therefore, the aim of an IRT analysis is to find the IRT model that best explains the observed data independent of whether the data support the construction of linear measures suitable for stable inferences. The debate between the use of Rasch analysis or IRT has been scant in the ophthalmic literature, although, in a seminal paper, Massof concluded that Rasch models are valid measurement models and IRT models are not.
There is considerable potential for modern psychometric methods to improve health outcomes measurement in ophthalmology. Using linear measures instead of nonlinear raw scores would give a true reflection of disease impact, differences between individuals and groups, and treatment effects. Work from our group has already provided empirical evidence of the benefits of Rasch analysis. In a recent paper, we demonstrated vastly improved precision in the measurement of cataract surgery outcomes using a Rasch-scaled VF-14 compared to traditional scoring of the original version. Several new instruments have recently been developed and validated using Rasch analysis, such as the QOL Impact of Refractive Correction in adults and the Impact of Visual Impairment in school-age children. Rasch analysis has also been used to reassess the psychometric validity of instruments initially developed and validated using traditional methods, namely the IVI scale for adults and the Catquest instrument.
In conclusion, vision-specific QoL research has advanced substantially over the last decade to the stage where scientific measurement must now be expected. The traditional approaches to developing rating scales (ie, item generation, item reduction, scale formation, and scale testing) remain valid in the initial developmental phases. However, as long as primitive counts and raw scores are routinely mistaken for measures, patient-reported research will struggle to be considered a reliable or useful science. There is therefore substantial room to improve the quality of existing instruments and for newly developed instruments to use modern psychometric methods to provide strong validity evidence. There is also a need for ophthalmic patient-reported research to look beyond the paper-based format and item-delimited scales. Future instruments should consider an item bank format, which is simply a pool of items that define latent traits such as QoL. The items in the bank represent differing amounts of that latent trait along a continuum. Items of relevance can be administered to different persons having different levels of the trait. Using a computer adaptive technology (CAT), a method for administering tests that adapts the items to the respondent’s ability, targeted items from an item bank are provided to the participants. Subsequent items are then selected based on the responses to previous questions and the selection proceeds until a predefined stopping criterion is reached. Pioneering work in this area has started, but a greater effort is needed towards the development of the first item bank in ophthalmology to provide an optimal measurement of the impact of vision loss on vision-specific QoL.