Outcomes Research and Evidence-Based Medicine

Outcomes Research and Evidence-Based Medicine
Michael G. Stewart
INTRODUCTION AND HISTORY
Outcomes research can be defined as the scientific study of the outcomes of disease therapies used for a particular disease, condition, or illness (1). While all clinical research measures some type of outcome, such as mortality, morbidity, or some other objective measure, in “outcomes research,” the patient’s perception of their outcome is assessed.
Historically, the movement toward outcomes-based research was started by Dr. Paul Ellwood, who in the 1980s suggested that in the future physicians would assess outcomes by measuring what the patient experienced (2). Subsequently, tools to assess these outcomes were developed and applied across many diseases. However, outcomes research also includes other types of studies in addition to patient-based outcomes studies. Today, outcomes researchers study all aspects of the health care system—from the status of the patient or population at entry, to the organization, delivery, regulation and financing of the health care system, to the status of the patient or population after treatment.
To be inclusive of other aspects of health services research, some have divided outcomes research into record-based outcomes research and patient-based outcomes research. Examples of these different types of studies are shown in Table 8.1.
TYPES OF STUDIES
Prospective, retrospective, or cross-sectional study designs can be used in outcomes research. However, many outcomes studies use an observational prospective design, where outcomes are assessed after diverse treatments—rather than an experimental prospective design, where treatments are carefully controlled or randomized. The differences between these study designs are an important point for discussion.
In experimental or controlled trials, particularly randomized trials, the ideal design uses two groups of patients that are nearly identical in every aspect—except the treatment received. So therefore, any difference in outcome between groups must be due only to the different treatments, since otherwise the groups were the same. Of course, acquiring patient groups that are actually identical is rarely achieved, but nevertheless, that is the basis behind the rigorous design and methodology of controlled trials. In addition, to account for the inherent differences between treatment groups, the randomization process should theoretically distribute those differences (say, in demographics or disease severity) equally between the two groups. Carefully controlled experimental studies can be said to measure the efficacy of treatment, under ideal circumstances. When nursing students need to analyze complex research methodologies like these for their coursework, many choose to write a nursing paper with BSN writers who specialize in research design and statistical analysis. These studies usually yield very reliable results concerning the effects of treatment—in the group that was studied.

TABLE 8.1 TYPES OF OUTCOMES RESEARCH
Record-based outcomes research
Meta-analysis
Appropriateness research
Reviews of administrative databases
Population-based studies
Patient-based outcomes research
Development of quality of life and health status instruments
Development of disease severity stratification systems
Longitudinal outcomes studies
Prospective
Retrospective
Observational
Modified from Rosenfeld RM, Bluestone C, eds. Evidence-based otitis media. Hamilton, ON: BC Decker, Inc., 1999:51-60, with permission.
However, there are questions about the generalizability of results from rigorously controlled trials to larger populations of patients with a disease because the larger group of patients will not be so carefully controlled and homogeneous. In addition, patient compliance and other factors may lead to different results than in tightly controlled trials. Furthermore, clinical trials are very expensive to perform, especially when considered from the standpoint of cost per patient studied.
Observational study designs are commonly used in outcomes research. All patients with a disease are included, and they are studied in the actual setting in which they receive their health care. This naturally introduces many other factors, which may impact outcome after treatment, including potential selection bias for different treatments. However, many would argue that results from large-scale outcome studies are more applicable to the general population because of their setting and scope. Studies that assess the actual (“real world”) results of treatment are said to measure the effectiveness of treatment versus efficacy measured from controlled trials (3). Furthermore, in addition to their “real-world” setting, the expanded outcome measured (quality of life, etc.) might be more relevant and important to patients than other clinical or biologic outcomes.
STEPS IN PERFORMING OUTCOMES RESEARCH
There are several basic steps involved in performing patient-based outcomes research (3,4,5). The fundamental steps are as follows: (a) identify and define the disease of interest, (b) create a staging system (clinical-severity index) for disease severity, (c) identify comorbid conditions, and (d) establish the outcomes to be measured (4). Studies can perform only one of those steps or some or all. Therefore, development and validation of an outcomes instrument is one type of outcomes research, and identification of comorbid conditions is another type. Each step is discussed in more detail below.
Define the Disease of Interest
This may be a straightforward step if the disease has clear and widely agreed upon diagnostic criteria. However, many diseases are difficult to rigorously define, such as gastroesophageal reflux or chronic rhinosinusitis, and research may be needed to develop clear reproducible diagnostic criteria.
Create a Staging System or Clinical-Severity Index
Grouping or stratification by disease severity is important in all types of clinical research. However, it is particularly important in outcomes research. This is because large numbers of patients may be studied without strict entry or exclusion criteria. Patients with more severe disease may receive more (or less) aggressive treatment, so it is therefore important to statistically adjust for disease severity when evaluating outcomes.
Staging systems already exist for many diseases, for example, the TNM staging system for cancer. It is important to distinguish between staging systems that are descriptive and systems that are prognostic. Descriptive staging systems simply group together patients who have similar characteristics. Prognostic systems are designed to predict an outcome; for example, the TNM staging system is designed to predict 5-year survival. In general, staging systems used for outcomes research should be prognostic (3). However, even if a prognostic staging system already exists, it may not contain all the important variables that predict outcome (5).
To develop a prognostic staging system, the researcher should first define the outcome of interest to be predicted by the staging system, which is defined as the dependent variable. Next, identify a group of variables that might predict outcome—those are the independent variables. Potential independent variables can be identified from prior literature, a prospective study, or expert opinion. Then next, perform a prospective study, identifying a heterogeneous group of subjects that is likely to contain patients with mild, moderate, and severe disease. In that group, measure the presence of all potential predictor variables and the outcomes after treatment.
Then, using data on both the outcome of interest and the presence of predictor variables, perform a multivariate analysis to identify which predictor variables actually impact outcome. Multivariate analysis is important in clinical studies because several different variables usually exert effects on each other, so it is preferable to study the effects of a large group of variables at the same time while controlling for the effects of the other potentially important variables.

Multivariate regression (linear or logistic) is one option for analysis; however, there are other options such as conjunctive consolidation (3,6,7). If regression is used, predictive factors are identified, and each can create a category with different outcomes. However, if multiple predictor variables are identified, the process of developing a single staging system can be difficult and at best requires multiple iterations of trial and analysis. The technique of conjunctive consolidation allows new clinical factors to be added to a staging system without necessarily increasing the number of groups or categories. Also, for development of a staging system, the data collection can be performed retrospectively, particularly if the outcome requires a significant time interval.
There are several potential “models” of staging systems from which to choose. Under any circumstances, developing a staging system is an iterative process in which patients are grouped by predictor variables, and the outcomes, by group, are assessed. If the groups are not sufficiently distinct, then another arrangement of predictor variables is used and outcomes by group are again compared. Ideally, the staging system should be organized so that patients are easily grouped into distinct strata, with clearly different outcomes, and such that all patients should be classifiable.
Identify Comorbid Conditions
The concept of a “comorbid” condition that seriously affected treatment outcome was first described by Dr. Alvan Feinstein. A comorbid condition is defined as a condition—distinct from the condition of interest—that affects the outcome being measured. For example, when measuring mortality from laryngeal cancer, if the patient has another serious condition causing potential mortality (i.e., unstable angina), then that condition is defined as a comorbid condition. Since the initial description, researchers in multiple specialties have identified the impact of comorbid disease on several different outcomes (6,7). Therefore, in any outcomes study, it is important to identify all potentially important comorbid conditions and to measure their presence and severity as part of the data collection process. Of course, that only applies if the comorbid condition actually affects the condition under study. Using the same example of unstable angina, if one were performing an outcomes study of hearing satisfaction 1 month after receiving different types of hearing aids, the presence of unstable angina would not necessarily be an important comorbid condition to consider.
Define the Outcomes to be Measured
The expanded, patient-based outcomes usually measured in outcomes research are quality of life, health status, and functional status. There are multiple potential definitions for each of those terms; however, “quality of life” has three key aspects: (a) it is more than the absence of disease, (b) it is subjective (assessed from the patient’s perspective), and (c) it is multidimensional. In addition, the overall quality of life depends on multiple aspects of life not directly related to disease, so most researchers studying treatment outcomes are actually assessing the “health-related quality of life.” Most outcomes instruments designed for use in patient care are designed to assess health-related quality of life. The term “health status” is self-explanatory, but again, it must be measured from the patient’s perspective. Functional status refers to the patient’s ability to perform daily activities. In most circumstances, researchers are only interested in the effect of a particular disease, so diseasespecific functional status is typically assessed.
To measure functional status or quality of life, the patient must answer several questions that have been validated for the purpose of measurement. Although these data can be gathered using interviews or other interactive techniques, under most circumstances patients complete a written questionnaire. In outcomes research, the questions are called “items” and the questionnaires are called “instruments.” Instruments must be validated, and the validation process uses the scientific principles of psychometrics. A full discussion of the process of instrument validation would require more than one chapter, although the basic concepts are reviewed.
A health status or quality of life instrument should be reliable, valid, and sensitive (8). Two types of reliability are usually assessed—test-retest reliability and internal consistency reliability. Test-retest reliability means that the results will be similar if the status of the patient has not changed, and internal consistency reliability means that responses on similar items will be correlated.
Validity means that the instrument is measuring what it is supposed to measure. Validity is confirmed by a combination of evidence: content validity, criterion validity (if scores on the instrument correlate with objective measurable external criteria), and construct validity (if scores on the instrument correlate with scores on other instruments measuring similar concepts).
Sensitivity (or responsiveness) means that the instrument is responsive to change in status. In other words, if the patient’s clinical status changes, then their score should also change. Sensitivity is assessed using statistical techniques measuring the degree of change against known standards, such as the standardized response mean and the effect size.
Another aspect of assessing sensitivity or change in status using an instrument has been called the “minimal significant difference” in score (3). For example, average scores on a health status instrument may change from 40 to 50 (on a scale of 0 to 100), and the difference might have a p-value less than 0.05. The question arises—is that 10-point difference a clinically significant change? If studies have indicated that the minimal clinically significant difference for the instrument is 15 points (out of 100), then the score change does not reach a level of minimal
significant difference. Therefore, although p-value indicates that the 10-point difference is likely not due to chance, it is probably too small a change to be noticed clinically by a patient.
Fortunately, it is seldom necessary to develop your own instrument. There are already hundreds of quality of life and health status instruments available in the literature. Some examples of validated quality of life instruments to assess global quality of life include the Medical Outcomes Study SF-36 and SF-12, the Quality of Well-Being Scale, the London Handicap Scale, the Sickness Impact Profile, and the Child Health Questionnaire. Some disease-specific quality of life instruments for Otolaryngology include the Chronic Sinusitis Survey, the Sinonasal Outcome Test-20 items, the University of Washington Quality of Life Index, and the Voice Handicap Index.
Assuming that a validated instrument exists, the main question for a researcher becomes: which instrument to choose? This choice is usually based on the content of the instrument, and the potential respondent burden. To discuss the assessment of content, a good example is hearing loss where several validated instruments are available. Review of the design and content of those instruments indicates that some are intended to assess satisfaction with a hearing aid, one is for elderly patients only, one is designed for conductive hearing loss in the otherwise healthy patient, etc. Therefore, content can guide instrument selection.
Respondent burden deals with the length of time and effort required to complete an instrument. Particularly when multiple questionnaires are used, selecting instruments with lower respondent burden should improve patient compliance and follow-up. In addition, researchers have found that shorter instruments are usually very sensitive to change, so little is lost from the standpoint of responsiveness by using a briefer instrument.
Another important issue is whether to choose a general (“global” or “generic”) quality of life/health status instrument or a disease-specific instrument. Both have advantages and disadvantages (3). Global instruments have the advantage of being comparable across disease states, and their use allows comparisons between the relative impacts of certain diseases. However, many global health status instruments are relatively insensitive to the impact of more limited disease states, which nevertheless may cause significant worsening of patients’ quality of life. This might not be true for all global instruments, but this finding has been replicated in several diseases. Therefore, if a global quality of life instrument is not sensitive enough, then use of a disease-specific instrument is appropriate.
Disease-specific instruments are much more sensitive to the impact of a particular disease, and they allow meaningful comparisons between treatments or groups of patients. However, disease-specific instruments do not allow comparisons across disease states, which can be a disadvantage if the goal is to demonstrate the overall impact of a given disease. Therefore, in many circumstances, it is wise to use both a global and a disease-specific instrument (3).
Finally, as discussed previously, traditional clinical outcomes such as disease-free survival remain important outcomes for assessment. In some cases in which the traditional outcomes are well-known and only the expanded, patient-based outcomes are of interest, then perhaps only quality of life/health status might be assessed. However, if there is a minimal added burden to the patient and researcher to assess “traditional” clinical outcomes (such as disease-free survival), under most circumstances, those outcomes should also be assessed.
RESULTS FROM OUTCOMES RESEARCH IN OTOLARYNGOLOGY
Outcomes studies in Otolaryngology have yielded many important results, and some brief examples are listed here. There are now validated outcomes instruments for use in chronic sinusitis, nasal obstruction, hearing loss, chronic ear disease, tinnitus, dizziness, head and neck cancer, voice, gastroesophageal reflux, tonsil and adenoid disease, pediatric otitis media, and pediatric sleep apnea, among other diseases. In addition, there is a comprehensive prognostic staging system for obstructive sleep apnea in adults. Studies have shown that global quality of life is significantly worsened in adults with chronic rhinosinusitis and improves to near normal after endoscopic sinus surgery. Similarly, global quality of life is significantly worsened in children with tonsil and adenoid disease. We have learned that comorbid conditions have significant impact on survival and outcome in patients with laryngeal cancer and other head and neck cancers. Disease-specific quality of life has been shown to improve significantly after surgical treatment for the following conditions: chronic rhinosinusitis, pediatric otitis media, vocal cord paralysis, conductive hearing loss, nasal septal deformity, and pediatric sleep apnea, among others. In addition, cochlear implantation has been demonstrated to be very cost-effective relative to other health care interventions.
OTHER TYPES OF OUTCOMES RESEARCH
In addition to patient-based studies, other types of outcomes research have made important contributions, for example, appropriateness studies. In an appropriateness study, records from a large population are reviewed to test the hypothesis that treatment (medical or surgical) was based on appropriate indications. These studies often yield controversial results for two primary reasons. One reason is that it is difficult to achieve consensus on what is an appropriate versus equivocal versus inappropriate indication. An appropriateness study might find that a large percentage of procedures were performed for “equivocal” indications, when there is legitimate contradicting evidence
showing that those indications could have been classified as “appropriate.” The second reason is that those studies rely heavily on medical record documentation, which is often insufficient. So although the patient’s chart might indicate that appropriate indications were not met, in fact that particular patient might have actually met all the criteria. Despite the controversy, these studies are important because they generate discussion and can act as an impetus for further research, collaborative guideline development, etc.
Another type of outcomes research is population studies. Many important findings have been identified from studies of rates of medical care received by different populations. Of particular interest are studies that compare the rates of procedures with controversial indications. Some early examples were studies of population rates of tonsillectomy, hysterectomy, lumbar disk surgery, and carotid endarterectomy: researchers found that populations that seemed remarkably similar in demographics, economics, and health status had markedly different rates of elective procedures performed. The only apparent differences seemed to be where they lived, and the number of specialty physicians per capita in their region. Other studies have shown remarkable differences in rates of hospital admission for the same admitting diagnoses in different cities, even after controlling for many health status and demographic factors that might influence admission rate. While these studies sometimes generate more questions than answers, they give important insight into the delivery of health care.
One important example of a database/populationbased study was one that compared overall survival between patients with obstructive sleep apnea who were treated with continuous positive airway pressure (CPAP) versus patients who underwent uvulopalatopharyngoplasty (9). This study was performed in the Veterans Affairs (VA) system, which has had a national computerized medical record for many years and has a consistent population of patients who receive all their care at the VA. The authors reviewed the database and identified more than 15,000 patients eligible for study. They found that—controlling for a variety of variables including severity and comorbidity —survival was better in the surgery group than the CPAP group. This seemed surprising since in head-to-head comparisons CPAP is more effective than uvulopalatopharyngoplasty. The reason for the population finding seemed to be that many people prescribed CPAP were actually not using it, whereas everyone who had surgery did receive the benefit of surgery. A subanalysis seemed to indicate that regular CPAP users actually had better survival than surgery patients, but again the total group of CPAP patients (including many nonusers) did worse. This sort of population-based outcomes study is very important in determining treatments that actually work in populations. This is also an example of the difference between effectiveness research (in a real-world population) and efficacy research (in a controlled trial).
Meta-analysis is another type of outcomes research. In meta-analysis, the results from several individual studies are combined and a new statistical analysis is performed using the raw data from individual studies. Meta-analysis is more than just a detailed review of the literature, and to perform the analysis, each study must have used very similar methods and report data in a similar format. Despite some methodologic complexity, meta-analysis is a very helpful tool to help achieve an adequate sample power to answer some questions that individual small studies cannot, and also to help resolve conflicting results from individual studies.
EVIDENCE-BASED MEDICINE
Evidence-based medicine (EBM) is an important topic in contemporary medicine. While in fact there is some type of evidence behind many aspects of medical treatment, the explicit techniques of “EBM” have only been recently described and popularized. Dr. David Sackett has been a key developer and leader in EBM and is the lead author on an excellent textbook in the field (10).
More medical schools are teaching their students the techniques of EBM, and articles, books, lectures, and courses on EBM are popular in many fields. Two contemporary disease-related textbooks cover surgical fields: Evidence-based otitis media, by Rosenfeld and Bluestone, and Evidence-based otolaryngology, by Shin, Hartnick and Randolph.
EBM has been defined as the conscientious, explicit, and judicious use of the current best evidence in making decisions about the care of individual patients (11). Furthermore, the practice of EBM has been described as integrating individual clinical expertise with the best available external clinical evidence from systematic research (11,12). Several points—and misconceptions—are worth emphasizing.
First, EBM does not mean only the use of randomized clinical trials. The definition states that it requires the use of the “best available” evidence, not “only the best” evidence. If findings from a randomized clinical trial are available, then that is strong, high-quality evidence and should be used. If, however, randomized trials have not been performed and therefore that type of evidence is not available, you can still practice EBM by using the best available evidence.
Next, EBM does not eliminate the physician’s own experience or knowledge base. The physician should integrate their own clinical experience with the patient’s desire and with the current best clinical evidence when deciding on the best treatment for an individual patient.
The practice of EBM has been likened to a three-legged stool—which would be unstable if one leg were missing (13). The three “legs” of the stool are best evidence, clinical experience, and patient wishes. Along those lines, it would not be practical or reasonable for a physician to rely
only on results from high-quality evidence to make clinical decisions—since there are many questions and issues that have not been addressed with experimental studies. On the other hand, if a physician relies only on personal experience, their practice could become out of date or inappropriate. So the thoughtful and appropriate combination of experience and evidence is a goal to emulate.
There are five steps in practicing EBM (10,12), and we cover each step individually. Those steps are as follows:
  • Ask an answerable clinical question.
  • Search for the best available external evidence.
  • Critically appraise the quality of the evidence.
  • Understand the findings from the best evidence and create a summary/recommendation.
  • Integrate the best evidence with clinical expertise and unique patient factors (desires, values, unique circumstances).
Ask an Answerable Clinical Question
This step might seem simple but in fact can be challenging. Many questions that clinicians might pose are quite general—for example, “is endoscopic sinus surgery effective?” —and there will be limited or no evidence that directly addresses such a question. When developing a specific and answerable question, there are several aspects of the question that should be considered: a helpful tool is to remember “PICO”: Patient, Intervention, Comparison, Outcome. Good clinical questions will usually define each of those four components. An example of a good question is the following: “In a 5-year-old child with acute Group A streptococcal pharyngitis (patient), does treatment with antibiotics and anti-inflammatories (intervention) reduce symptoms and fever duration (outcomes), compared to anti-inflammatories alone (comparison)?” That is a potentially answerable clinical question.
While it is possible to practice EBM when starting with very general questions, experts in the field have found that focusing the question makes all the subsequent steps easier —particularly the search for evidence.
Search for the Best Available Evidence
The evidence used in EBM is from clinical studies on humans, not from laboratory or animal studies. The evidence search should be a rigorous process for the most contemporary evidence using available technology—not merely identifying a textbook chapter or other handy reference. There are some available databases, for example, the Cochrane Library, that identify and grade pertinent literature on many clinical topics. These are usually compiled by experts, and updated regularly. In addition, there are journals devoted to EBM. Furthermore, evidence-based reviews for many clinical questions have been completed and disseminated. These might be published in the peerreviewed literature, or as monographs, or placed on Web sites, among other locations. So on occasion, it is possible to identify the best evidence without actually performing the search yourself.
If, however, you need to perform your own search, then using MEDLINE over the internet is another option. This topic is explored in more detail elsewhere (14). But briefly, MEDLINE is a database of the published biomedical literature from around the world. It is maintained by the National Library of Medicine and is available on the internet free of charge. Journal articles are referenced into MEDLINE by trained librarians using index terms called Medical Subject Heading terms, or “MeSH terms.” Once the user becomes familiarized with the techniques of searching using MEDLINE, comprehensive lists of articles that cover a particular topic can be identified. The search should be organized and planned to identify articles that address the answerable question.
MEDLINE is quite comprehensive, so in fact, many times a search will yield a very large number of articles—many of which might not be pertinent to the search of interest. In addition, not all medical journals are indexed into MEDLINE, and very old articles are not included (although MEDLINE is systematically adding references from before 1966). So there is some published literature that will not be found using MEDLINE. However, MEDLINE is overall an extraordinarily powerful tool for searching the biomedical literature.
Once the search is completed, the list of articles identified should be perused, and articles pertinent to the specific question pulled for further review.
Critically Appraise the Quality of the Evidence
In EBM, there is a fundamental principle at work: not all evidence is equal. Studies are evaluated based on their methodology, and studies using superior methodology are given more “weight” and evidence from those studies is considered more strongly. The basic rules of study quality are as follows: randomized studies are better than nonrandomized, prospective studies are better than retrospective, and controlled studies are better than noncontrolled. By definition, in EBM, only clinical studies using human subjects are considered. While basic laboratory research is an essential part of discovery in medicine, until studies have been performed in humans, their results are not included in the practice of EBM.
Individual studies are rated and given a level, based on the quality of the methodology. The standard hierarchy of levels proposed by Sackett is shown in Table 8.2. There are different levels depending on the type of question or study. For example, in a study on active therapy, the best methodology would be a randomized controlled trial (RCT); therefore, that represents level 1 evidence. However, in a study of the prognosis of some disease, randomization is not an option, and the best possible methodology would be a prospective cohort study; therefore, that is level 1 evidence for that type of question.

TABLE 8.2 EVIDENCE LEVELS FOR THERAPY/ETIOLOGY STUDIES AND PROGNOSIS STUDIES
Evidence Level Therapy/Etiology Prognosis
1a Systematic review (SR) of RCTs SR of inception cohort studies
1b Single RCT Individual cohort study
1c “All or none” study “All or none” case series
2a SR of cohort studies SR of either retrospective cohort studies or untreated control groups in randomized trials
2b Individual cohort study Retrospective cohort studies or untreated control groups in randomized trials
2c Outcomes research Outcomes research
3a SR of case-control studies
3b Individual case-control study
4 Case series Case series
5 Expert opinion Expert opinion
Modified Sackett DL, Straus SE, et al. Evidence-based medicine: how to practice and teach EBM, 2nd ed.
London: Wolfe Pub Ltd., 2000, used with permission.
Study ratings are based on methodology, but if there are other problems with the study, the reviewer has options. If the study has major flaws, such an inappropriate entry criteria or obvious bias, the reviewer may exclude the study and not give it any grade. If problems are more minor, such as no power analysis or incomplete data reporting, then the study can be given a “minus” grade, such as level 2-. It is not appropriate to move the evidence level down a level, such as from 2 to 3, because a different level of evidence means a different methodology was used.
Studies should be organized, from highest level to lowest. Often, creating a table is the best way to organize the studies. If there are multiple studies of the same level, then studies showing similar findings can be organized together.
With the effort involved in creating a question, searching, finding, and grading the literature, it may be wise to maintain a record of search results, particularly for frequently asked questions. These organized search results have been called critically appraised topics, or “CATs” (10). When practicing EBM, the clinician can refer back to CATs whenever needed, and when new evidence is reported, it can be added to an existing CAT.
Understand the Results and Create a Recommendation
Once individual studies have been rated, then the overall results and findings are reviewed for evidence quality and the consistency of results. While individual studies are given levels, the overall evidence is given a grade. Grades of evidence are summarized in Table 8.3. This final compilation can be called a grade of recommendation or grade of overall evidence.
There is some judgment required in assigning a grade to the overall evidence. The grade is based on the best quality evidence, and the consistency of evidence and results—not just the level of evidence with the most papers. For example, there will often be multiple case series (level 4 evidence) reported, but if there are a large number of RCTs with consistent results, that would considered grade A evidence —even if there are numerically more level 4 studies than level 1. On the other hand, just because there are one or two RCTs (level 1 studies), that does not automatically mean there is grade A evidence. If the few studies show conflicting results or have methodologic problems, and the rest of the studies are case series, then the overall evidence might actually be grade C. Again, this requires some interpretation and judgment by the reviewer.
Integrate Best Evidence with Clinical Experience and Patient’s Circumstances
This is the key last step in practicing EBM. After the best external evidence has been identified, graded and summarized, and an overall grade of recommendation created, that recommendation is then integrated with the patient’s unique circumstances and the clinician’s experience and judgment. The physician practicing EBM should keep in mind the “three legs” of EBM and the importance of each.
TABLE 8.3 GRADES OF OVERALL EVIDENCE
Evidence Grade Levels of Evidence
Grade A Consistent level 1 studies
Grade B Consistent level 2 or 3 studies
Grade C Level 4 studies
Grade D Level 5 studies
Modified from Sackett DL, Straus SE, et al. Evidence-based medicine: how to practice and teach EBM, 2nd ed. London: Wolfe Pub Ltd., 2000, used with permission.

REFERENCES
1. Piccirillo JF, Stewart MG, Gliklich RE, et al. Outcomes research primer. Otolaryngol Head Neck Surg 1997;117:380-387.
2. Ellwood PM. Shattuck lecture—outcomes management. A technology of patient experiences. N Engl J Med 1988;318:1549-1556.
3. Stewart MG, Neely JG, Hartman JM, et al. Tutorials in clinical research part V: outcomes research. Laryngoscope 2002;112:248-254.
4. Piccirillo JF. Outcomes research in Otolaryngology. Otolaryngol Head Neck Surg 1994;111:764-769.
5. Stewart MG, Neely JG, Paniello RC, et al. A practical guide to understanding outcomes research. Otolaryngol Head Neck Surg 2007;137:700-706.
6. Piccirillo JF. Importance of comorbidity in head and neck cancer. Laryngoscope 2000;110:593-602.
7. Piccirillo JF. Inclusion of comorbidity in a staging system for head and neck cancer. Oncology 1995;9(9):831-836.
8. Stewart MG. Patient-based outcomes research. In: Rosenfeld RM, Bluestone C, eds. Evidence-based otitis media. Hamilton, ON: BC Decker, Inc., 1999:51-60.
9. Weaver EM, Maynard C, Yueh B. Survival of veterans with sleep apnea: continuous positive airway pressure versus surgery. Otolaryngol Head Neck Surg 2004;130:659-665.
10. Sackett DL, Straus SE, et al. Evidence-based medicine: how to practice and teach EBM, 2nd ed. London: Wolfe Pub Ltd., 2000:1-261.
11. Sackett DL, Rosenberg WMC, Gray JAM, et al. Evidence-based medicine: what it is and what it isn’t. BMJ 1996;312:71-72.
12. Shin JJ, Hartnick CJ. Introduction to evidence-based medicine, levels of evidence and systematic reviews. In: Shin JJ, Hartnick CH, Randolph GW, eds. Evidence-based otolaryngology. New York: Springer, 2008:3-12.
13. Rosenfeld RM. Evidence, outcomes, and common sense. Otolarygol Head Neck Surg 2001;124:123-124.
14. Stewart MG, Kuppersmith RB, Moore AS. Searching the literature on the internet. Otolaryngol Clin North Am 2002;35:1163-1174.
15. Stewart MG. What are evidence-based guidelines? In: Lee KJ, ed. Health Care Reform Through Practical Guidelines in ENT. San Diego, CA: Plural Publishers, 2010:9-23.
16. Rosenfeld RM, Shiffman RN. Clinical practice guidelines: a manual for developing evidence-based guidelines to facilitate performance measurement and quality improvement. Otolaryngol Head Neck Surg 2006;135(4 Suppl):S1-S28.
17. Bellorini J, Dorée C, Chamberlain I, et al. The cochrane ear, nose and throat disorders group. Otolaryngol Head Neck Surg 2007;137(4 Suppl):S55-S60.
Outcomes Research and Evidence-Based Medicine
Michael G. Stewart
INTRODUCTION AND HISTORY
Outcomes research can be defined as the scientific study of the outcomes of disease therapies used for a particular disease, condition, or illness (1). While all clinical research measures some type of outcome, such as mortality, morbidity, or some other objective measure, in “outcomes research,” the patient’s perception of their outcome is assessed.
Historically, the movement toward outcomes-based research was started by Dr. Paul Ellwood, who in the 1980s suggested that in the future physicians would assess outcomes by measuring what the patient experienced (2). Subsequently, tools to assess these outcomes were developed and applied across many diseases. However, outcomes research also includes other types of studies in addition to patient-based outcomes studies. Today, outcomes researchers study all aspects of the health care system—from the status of the patient or population at entry, to the organization, delivery, regulation and financing of the health care system, to the status of the patient or population after treatment.
To be inclusive of other aspects of health services research, some have divided outcomes research into record-based outcomes research and patient-based outcomes research. Examples of these different types of studies are shown in Table 8.1.
TYPES OF STUDIES
Prospective, retrospective, or cross-sectional study designs can be used in outcomes research. However, many outcomes studies use an observational prospective design, where outcomes are assessed after diverse treatments—rather than an experimental prospective design, where treatments are carefully controlled or randomized. The differences between these study designs are an important point for discussion.
In experimental or controlled trials, particularly randomized trials, the ideal design uses two groups of patients that are nearly identical in every aspect—except the treatment received. So therefore, any difference in outcome between groups must be due only to the different treatments, since otherwise the groups were the same. Of course, acquiring patient groups that are actually identical is rarely achieved, but nevertheless, that is the basis behind the rigorous design and methodology of controlled trials. In addition, to account for the inherent differences between treatment groups, the randomization process should theoretically distribute those differences (say, in demographics or disease severity) equally between the two groups. Carefully controlled experimental studies can be said to measure the efficacy of treatment, under ideal circumstances. These studies usually yield very reliable results concerning the effects of treatment—in the group that was studied.

TABLE 8.1 TYPES OF OUTCOMES RESEARCH
Record-based outcomes research
Meta-analysis
Appropriateness research
Reviews of administrative databases
Population-based studies
Patient-based outcomes research
Development of quality of life and health status instruments
Development of disease severity stratification systems
Longitudinal outcomes studies
Prospective
Retrospective
Observational
Modified from Rosenfeld RM, Bluestone C, eds. Evidence-based otitis media. Hamilton, ON: BC Decker, Inc., 1999:51-60, with permission.
However, there are questions about the generalizability of results from rigorously controlled trials to larger populations of patients with a disease because the larger group of patients will not be so carefully controlled and homogeneous. In addition, patient compliance and other factors may lead to different results than in tightly controlled trials. Furthermore, clinical trials are very expensive to perform, especially when considered from the standpoint of cost per patient studied.
Observational study designs are commonly used in outcomes research. All patients with a disease are included, and they are studied in the actual setting in which they receive their health care. This naturally introduces many other factors, which may impact outcome after treatment, including potential selection bias for different treatments. However, many would argue that results from large-scale outcome studies are more applicable to the general population because of their setting and scope. Studies that assess the actual (“real world”) results of treatment are said to measure the effectiveness of treatment versus efficacy measured from controlled trials (3). Furthermore, in addition to their “real-world” setting, the expanded outcome measured (quality of life, etc.) might be more relevant and important to patients than other clinical or biologic outcomes.
STEPS IN PERFORMING OUTCOMES RESEARCH
There are several basic steps involved in performing patient-based outcomes research (3,4,5). The fundamental steps are as follows: (a) identify and define the disease of interest, (b) create a staging system (clinical-severity index) for disease severity, (c) identify comorbid conditions, and (d) establish the outcomes to be measured (4). Studies can perform only one of those steps or some or all. Therefore, development and validation of an outcomes instrument is one type of outcomes research, and identification of comorbid conditions is another type. Each step is discussed in more detail below.
Define the Disease of Interest
This may be a straightforward step if the disease has clear and widely agreed upon diagnostic criteria. However, many diseases are difficult to rigorously define, such as gastroesophageal reflux or chronic rhinosinusitis, and research may be needed to develop clear reproducible diagnostic criteria.
Create a Staging System or Clinical-Severity Index
Grouping or stratification by disease severity is important in all types of clinical research. However, it is particularly important in outcomes research. This is because large numbers of patients may be studied without strict entry or exclusion criteria. Patients with more severe disease may receive more (or less) aggressive treatment, so it is therefore important to statistically adjust for disease severity when evaluating outcomes.
Staging systems already exist for many diseases, for example, the TNM staging system for cancer. It is important to distinguish between staging systems that are descriptive and systems that are prognostic. Descriptive staging systems simply group together patients who have similar characteristics. Prognostic systems are designed to predict an outcome; for example, the TNM staging system is designed to predict 5-year survival. In general, staging systems used for outcomes research should be prognostic (3). However, even if a prognostic staging system already exists, it may not contain all the important variables that predict outcome (5).
To develop a prognostic staging system, the researcher should first define the outcome of interest to be predicted by the staging system, which is defined as the dependent variable. Next, identify a group of variables that might predict outcome—those are the independent variables. Potential independent variables can be identified from prior literature, a prospective study, or expert opinion. Then next, perform a prospective study, identifying a heterogeneous group of subjects that is likely to contain patients with mild, moderate, and severe disease. In that group, measure the presence of all potential predictor variables and the outcomes after treatment.
Then, using data on both the outcome of interest and the presence of predictor variables, perform a multivariate analysis to identify which predictor variables actually impact outcome. Multivariate analysis is important in clinical studies because several different variables usually exert effects on each other, so it is preferable to study the effects of a large group of variables at the same time while controlling for the effects of the other potentially important variables.

Multivariate regression (linear or logistic) is one option for analysis; however, there are other options such as conjunctive consolidation (3,6,7). If regression is used, predictive factors are identified, and each can create a category with different outcomes. However, if multiple predictor variables are identified, the process of developing a single staging system can be difficult and at best requires multiple iterations of trial and analysis. The technique of conjunctive consolidation allows new clinical factors to be added to a staging system without necessarily increasing the number of groups or categories. Also, for development of a staging system, the data collection can be performed retrospectively, particularly if the outcome requires a significant time interval.
There are several potential “models” of staging systems from which to choose. Under any circumstances, developing a staging system is an iterative process in which patients are grouped by predictor variables, and the outcomes, by group, are assessed. If the groups are not sufficiently distinct, then another arrangement of predictor variables is used and outcomes by group are again compared. Ideally, the staging system should be organized so that patients are easily grouped into distinct strata, with clearly different outcomes, and such that all patients should be classifiable.
Identify Comorbid Conditions
The concept of a “comorbid” condition that seriously affected treatment outcome was first described by Dr. Alvan Feinstein. A comorbid condition is defined as a condition—distinct from the condition of interest—that affects the outcome being measured. For example, when measuring mortality from laryngeal cancer, if the patient has another serious condition causing potential mortality (i.e., unstable angina), then that condition is defined as a comorbid condition. Since the initial description, researchers in multiple specialties have identified the impact of comorbid disease on several different outcomes (6,7). Therefore, in any outcomes study, it is important to identify all potentially important comorbid conditions and to measure their presence and severity as part of the data collection process. Of course, that only applies if the comorbid condition actually affects the condition under study. Using the same example of unstable angina, if one were performing an outcomes study of hearing satisfaction 1 month after receiving different types of hearing aids, the presence of unstable angina would not necessarily be an important comorbid condition to consider.
Define the Outcomes to be Measured
The expanded, patient-based outcomes usually measured in outcomes research are quality of life, health status, and functional status. There are multiple potential definitions for each of those terms; however, “quality of life” has three key aspects: (a) it is more than the absence of disease, (b) it is subjective (assessed from the patient’s perspective), and (c) it is multidimensional. In addition, the overall quality of life depends on multiple aspects of life not directly related to disease, so most researchers studying treatment outcomes are actually assessing the “health-related quality of life.” Most outcomes instruments designed for use in patient care are designed to assess health-related quality of life. The term “health status” is self-explanatory, but again, it must be measured from the patient’s perspective. Functional status refers to the patient’s ability to perform daily activities. In most circumstances, researchers are only interested in the effect of a particular disease, so diseasespecific functional status is typically assessed.
To measure functional status or quality of life, the patient must answer several questions that have been validated for the purpose of measurement. Although these data can be gathered using interviews or other interactive techniques, under most circumstances patients complete a written questionnaire. In outcomes research, the questions are called “items” and the questionnaires are called “instruments.” Instruments must be validated, and the validation process uses the scientific principles of psychometrics. A full discussion of the process of instrument validation would require more than one chapter, although the basic concepts are reviewed.
A health status or quality of life instrument should be reliable, valid, and sensitive (8). Two types of reliability are usually assessed—test-retest reliability and internal consistency reliability. Test-retest reliability means that the results will be similar if the status of the patient has not changed, and internal consistency reliability means that responses on similar items will be correlated.
Validity means that the instrument is measuring what it is supposed to measure. Validity is confirmed by a combination of evidence: content validity, criterion validity (if scores on the instrument correlate with objective measurable external criteria), and construct validity (if scores on the instrument correlate with scores on other instruments measuring similar concepts).
Sensitivity (or responsiveness) means that the instrument is responsive to change in status. In other words, if the patient’s clinical status changes, then their score should also change. Sensitivity is assessed using statistical techniques measuring the degree of change against known standards, such as the standardized response mean and the effect size.
Another aspect of assessing sensitivity or change in status using an instrument has been called the “minimal significant difference” in score (3). For example, average scores on a health status instrument may change from 40 to 50 (on a scale of 0 to 100), and the difference might have a p-value less than 0.05. The question arises—is that 10-point difference a clinically.

Stay updated, free articles. Join our Telegram channel

May 24, 2016 | Posted by in OTOLARYNGOLOGY | Comments Off on Outcomes Research and Evidence-Based Medicine

Full access? Get Clinical Tree

Get Clinical Tree app for offline access