Understanding Data and Interpreting the Literature



Understanding Data and Interpreting the Literature


Richard M. Rosenfeld



“It is a capital mistake to theorize before one has data,” observed Sherlock Holmes. Physicians are rarely guilty of this offense, because they are inundated with data from clinical observations, journal articles, professional meetings, and governmental agencies. Unfortunately, the theories that result often have little relevance to the data that produced them. This is not unexpected, because physicians rarely devote the same efforts to mastering data that they devote to mastering clinical skills. As a partial remedy, this chapter offers a principle-centered approach to data analysis with emphasis on interpreting the medical literature (1,2).

Nearly all data relevant to clinical care comes from articles in peer-reviewed medical journals, whose content is generally assessed using principles that mirror those in this chapter (3).Unfortunately, the process of editorial peer review is largely untested and its effects are uncertain. Manuscript assessment, like even the most sophisticated diagnostic tests, has a certain sensitivity and specificity; worthy articles may be unappreciated (and unpublished) or worthless ones may pass undetected to the printing press. Peer reviewers may be biased, unqualified, or possess widely discrepant opinions about a study. The bottom line? Caveat lector: beware of what you read even in excellent medical journals.

Of more concern to clinicians than the inadequacies of peer review, however, is that the medical literature generally serves science rather than medical practice. Peer-reviewed publications facilitate communication from scientist to scientist, not necessarily from scientist to clinician. Most published studies are nondefinitive tests of hypotheses and innovations, only a very small percentage of which may warrant routine clinical application. Whereas the science may be sound, the idea has not progressed beyond the laboratory or preliminary field studies. Definitive studies constituting true scientist to clinician communication are rare in medical journals, and must be identified by critical appraisal.

Clinicians can use the medical literature to support clinical decisions in two complimentary ways: regular surveillance (or browsing) and problem-oriented searches. While the latter mode is more effective for learning, both are necessary for continuing clinical competence. Both methods require appreciating the purposes of the medical literature and understanding of the strengths and weaknesses of various study designs for providing valid and clinically applicable information.


HOW TO IDENTIFY ARTICLES WORTH READING

Articles worthy of in-depth analysis have enticing titles and abstracts, which espouse innovative, controversial, or clinically relevant ideas. Reading the abstract alone, however, is a poor substitute for perusing the entire article. Whereas abstracts in otolaryngology journals generally do convey information about study design, sample size, and the source of the data, they infrequently describe adverse events, study limitations, and dropouts or losses (4). Abstracts may also report selective, incomplete, or inaccurate data leading to biased conclusions. Abstracts are a starting point for analysis, not the finish line.

Worthy articles appear as original research in the main section of a peer-reviewed journal. Methods and results sections, which represent the heart and soul of the article, should be appropriately detailed and lengthy. Statistical reporting includes confidence limits and measures of clinical benefit (5). Enough details should be provided for you to reproduce the study on your own, if desired, with a reasonable chance of obtaining the same results. A quick review of the paper in general should disclose many of the signs of grandeur in Table 7.1.









TABLE 7.1 SIGNS OF GRANDEUR AND DECADENCE IN JOURNAL ARTICLES































Section


Signs of Grandeur


Signs of Decadence


Abstract


Structured summary of goals, methods, results, and significance


Unstructured qualitative overview of study; contains more wish than reality


Introduction


Clear, concise, and logical; ends with study rationale or purpose


Rambling, verbose literature review; no critical argument or hypothesis


Methods


Specific enough for the reader to reproduce the study and understand how quantitative results were generated


Vague or incomplete description of subjects, sampling, outcome criteria; no mention of statistical analysis


Results


Logical blend of narrative and numbers, including 95% CIs, with supporting tables and figures


Difficult to read, with overuse or underuse of statistical tests; emphasis on P-values, not clinical importance


Discussion


Puts main results in context; reviews supporting and conflicting literature; discusses strengths and weaknesses


Full of fantasy and speculation; rambling and biased literature review; does not acknowledge weaknesses


References


Demonstrates clearly that work of others has been systematically considered; emphasizes original research from peer-reviewed journals


Key articles are conspicuously absent; excessively brief; emphasizes review articles, book chapters, and lower-quality journals


Articles unworthy of in-depth analysis may have enticing titles and abstracts, but have no ability to support the lofty claims and conclusions therein. The article may appear in a non-peer-reviewed (throwaway) journal or in an industry-funded supplement to the main section (which generally implies lower quality). Signs of decadence (Table 7.1) are readily apparent when perusing the article’s main sections. The methods and results sections are vague and sparse, overshadowed by a verbose discussion section with unsupported opinions and creative misinterpretations. Don’t waste any time analyzing an unworthy article unless the premise is so novel and important that it overshadows the obvious weaknesses.


FIVE BASIC QUESTIONS FOR INTERPRETING DATA

As expert clinicians consider something abnormal until they examine it and prove otherwise, a connoisseur of medical evidence considers a data set or a journal article to be laden with flaws, distortions, and omissions until proven to the contrary. The five basic questions in Table 7.2 are the key to this analytic process. Each question is discussed below using established principles of data analysis and literature interpretation (1).


Question 1: How Was the Study Performed


Study Design

Medical data arise from a research study, defined as an “organized quest for new knowledge, based on curiosity or perceived needs” (6). Validity of the data is determined in large part by the study design (specific procedures and methods) used by the investigators to produce their data. The study design must fit the research question. Despite the befuddling array of study designs espoused in the epidemiologic literature, the savvy data analyst need only address a few basic considerations (Table 7.3). These relate to (a) how the data were gathered, (b) what degree of control the investigator had over study conditions, (c) whether a control or comparison group was used, and (d) what direction of inquiry was followed.

Data collected specifically for research (Table 7.3) are likely to be unbiased—they reflect the true value of the attribute being measured. In contrast, data collected during routine clinical care will vary in quality. Experimental studies, such as randomized trials, are likely to produce highquality data because they are performed under carefully controlled conditions. In observational studies, however, the investigator is simply a bystander who records the natural course of health events during clinical care. Regardless of data quality bias may be introduced at any stage of the research process, which includes reviewing literature, defining baseline states, performing interventions, measuring outcomes, analyzing data, and publishing results (7).

The presence or absence of a control group has a profound influence on data interpretation. An uncontrolled study—no matter how elegant—is purely descriptive (2). Case series, which appear frequently in the otolaryngology literature, cannot assess efficacy or effectiveness, but they can convey feasibility, experience, technical details of an intervention, and predictive factors associated with good outcomes or adverse events (8). The best case series use a consecutive sample of patients, adjust for interfering

variables, plan in advance for systematic data collection, and are humble and cautious when interpreting results.








TABLE 7.2 FIVE BASIC QUESTIONS FOR INTERPRETING MEDICAL DATA



























Question


Why It Is Important


Underlying Principles


1. What type of study produced the data?


Study design has a profound impact on interpretation; scrutinize the data collection, degree of investigator control, use of control groups, and direction of inquiry


Bias, research design, placebo effect, control groups, causality


2. What are the results?


Results should be summarized with appropriate descriptive statistics; positive results must be qualified by the chance of being wrong, and negative results by the chance of having missed a true difference


Measurement scale, association, P value, power, effect size, clinical importance


3. Are the results valid within the study?


Proper statistical analysis and data collection ensures valid results for the subjects studied; measurements must be accurate and reproducible


Internal validity, accuracy, statistical tests


4. Are the results valid outside the study?


Results can be generalized when the sampling method is sound, subjects are representative of the target population, and sample size is large enough for adequate precision


External validity, sampling, CIs, precision


5. Are the results strong and consistent?


A single study is rarely definitive; results must be viewed relative to their plausibility, consistency with past efforts, and by the strength of the study methodology


Research integration, level of evidence, systematic review









TABLE 7.3 EFFECT OF STUDY DESIGN ON DATA INTERPRETATION























































Aspect of Study Design


Effect on Data Interpretation


How were the data originally collected?



Specifically for research


Interpretation is facilitated by quality data collected according to an a priori protocol



During routine clinical care


Interpretation is limited by the consistency, accuracy, availability, and completeness of the source records



Database or data registry


Interpretation is limited by representativeness of the sample and the quality and completeness of data fields


Is the study experimental or observational?



Experimental study with conditions under direct control of the investigator


Low potential for systematic error (bias); bias can be reduced further by randomization and masking (blinding)




Observational study without intervention other than to record, classify, analyze


High potential for bias in sample selection, treatment assignment, measurement of exposures, and outcomes


Is there a comparison or control group?



Comparative or controlled study with two or more groups


Permits analytic statements concerning efficacy, effectiveness, and association



No comparison group present


Permits descriptive statements only, because of improvements from natural history and placebo effect


What is the direction of study inquiry?



Subjects identified prior to an outcome or disease; future events recorded


Prospective design measures incidence (new events) and causality (if comparison group included)



Subjects identified after an outcome or disease; past histories are examined


Retrospective design measures prevalence (existing events) and causality (if comparison group included)



Subjects are identified at a single time point, regardless of outcome or disease


Cross-sectional design measures prevalence (existing events) and association (if comparison group included)









TABLE 7.4 EXPLANATIONS FOR FAVORABLE OUTCOMES IN TREATMENT STUDIES











































Explanation


Definition


Solution


Bias


Systematic variation of measurements from their true values; may be intentional or unintentional


Accurate, protocol-driven data collection


Chance


Random variation without apparent relation to other measurements or variables (e.g., getting lucky)


Control or comparison group


Natural history


Course of a disease from onset to resolution; may include relapse, remission, and spontaneous recovery


Control or comparison group


Regression to the mean


Symptom improvement independent of therapy, as sick patients return to a mean level after seeking care


Control or comparison group


Placebo effect


Beneficial effect caused by the expectation that the regimen will have an effect (e.g., power of suggestion)


Control or comparison group with placebo


Halo effect


Beneficial effect caused by the manner, attention, and caring of a provider during a medical encounter


Control or comparison group treated similarly


Confounding


Distortion of an effect by other prognostic factors or variables for which adjustments have not been made


Randomization or multivariate analysis


Allocation (susceptibility) bias


Beneficial effect caused by allocating subjects with less severe disease or better prognosis to treatment group


Randomization or comorbidity analysis


Ascertainment (detection) bias


Favoring the treatment group during outcome analysis (e.g., rounding up for treated subjects, down for controls)


Masked (blinded) outcome assessment


Without a control or comparison group, treatment effects cannot be distinguished from other causes of clinical change (Table 7.4) (9). Some of these causes are found in Figure 7.1, which depicts change in health status after a healing encounter as a complex interaction of three primary factors:



  • What was actually done. Specific effect(s) of therapy, including medications, surgery, physical manipulations, and alternative or integrative approaches.


  • What would have happened anyway. Spontaneous resolution, including natural history, random fluctuations in disease status, and regression to a mean symptom state.


  • What was imagined to be done. Placebo response, defined as a change in health status resulting from the symbolic significance attributed by the patient (or proxy) to the encounter itself (10).

A placebo response is most likely to occur when the patient receives a meaningful and personalized explanation, feels care and concern expressed by the healer, and achieves control and mastery over illness (or believes that the healer can control the illness).The placebo response differs from the traditional definition of placebo as an inactive medical substance. Unlike the “placebo pills” in randomized trials, a placebo response can be elicited by touch, words, gestures, local ambiance, and social interactions. A valid and reliable 12-item survey, the PR-12, is available to measure aspects of the placebo response in office encounters (11).






Figure 7.1 Model depicting change in health status after a healing encounter. Dashed arrow shows that a placebo response may occur from symbolic significance of the specific therapy given or from interpersonal aspects of the encounter.



Assessing Causality

When data from a comparison or control group are available, statistics may be used to test hypotheses and measure associations. Causality may also be assessed when the study has a time-span component, either retrospective or prospective (Table 7.3). Prospective studies measure incidence (new events) whereas retrospective studies measure prevalence (existing events). Unlike time-span studies, cross-sectional inquiries (surveys, screening programs, evaluations of diagnostic tests) measure association, not causality.

Efficacy and causality are best assessed by randomized controlled trials, because nonrandom treatment assignment is prone to innate distortions caused by individual judgments and other selective decisions (allocation bias). A dangerous habit, however, is to label all randomized trials as high quality and all observational studies (e.g., outcomes research) as substandard. Randomization cannot compensate for imprecise selection criteria, poorly defined endpoints, inadequate follow-up, or low compliance with treatment. Moreover, randomized trials with inadequate methodology tend to exaggerate treatment effects compared with trials that are properly designed and executed (12).

The best randomized trials ensure adequate randomization, conceal treatment allocation (blinding), and analyze results by intention-to-treat. The intention-totreat analysis maintains treatment groups that are similar apart from random variation, which may not occur if only subjects who complied with treatment (on-treatment analysis) are included (13). A blinded (masked) trial is always superior to a nonblinded (open, open-label, or unmasked) trial in which everyone involved knows who received which interventions (14). In a double-blind trial the participants, investigators, and assessors all remain unaware of the intervention assignments. A triple-blind trial also maintains a blind data analysis, but some use this simply to indicate that the investigators and assessors are distinct.








TABLE 7.5 MEASUREMENT SCALES FOR DESCRIBING AND ANALYZING DATA



























Scale


Definition


Examples


Dichotomous


Classification into either of two mutually exclusive categories


Breast feeding (yes/no), sex (male/female)


Nominal


Classification into unordered qualitative categories


Race, religion, country of origin


Ordinal


Classification into ordered qualitative categories, but with no natural (numerical) distance between their possible values


hearing loss (none, mild, moderate), patient satisfaction (low, medium, high), age group


Numerical


Measurements with a continuous scale, or a large number of discrete ordered values


Temperature, age in years, hearing level in decibels


Numerical (censored)


Measurements on subjects lost to follow-up or in whom a specified event has not yet occurred at the end of a study


Survival rate, recurrence rate, or any time-to-event outcome in a prospective study


Randomized controlled trials comprise only about 4% of articles in leading otolaryngology journals, with about 25% supported by industry funding (15). The presence of industry support, however, is unrelated to conclusions favoring intervention. Nearly 60% of articles use intentionto-treat analysis, but only a minority specify randomization schemes, employ a double-blind protocol, include confidence intervals (CIs), or explicitly discuss adverse events. The earlier advice of “caveat lector,” therefore, also applies to randomized trials, not just observational research.


Question 2: What Are the Results?


Describing Central Tendency and Dispersion

Describing results begins by defining the measurement scale that best suits the observations. Categorical (qualitative) observations fall into one or more categories, and include dichotomous, nominal, and ordinal scales (Table 7.5). Numerical (quantitative) observations are measured on a continuous scale, and are further classified with a graphic display to assess distribution (histogram, stem-leaf plot, or frequency distribution curve) (16). Numerical data with a symmetric (normal or Gaussian) distribution are evenly placed around a central crest or trough (bell-shaped curve). Numerical data with an asymmetric distribution are skewed (shifted) to one side of the center or contain unusually high or low outlier values. Skewed data can sometimes be normalized with a transformation (e.g., logarithmic).

When summarizing numerical data, the descriptive method varies according to the underlying distribution. Numerical data with a symmetric distribution are best summarized with the mean (Table 7.6) and standard deviation (SD), because 68.3% of the observations fall within the mean ±1 SD, 95.4% within the mean ±2 SD, and 99.7% within the mean ±3 SD. In contrast, asymmetric numerical data are best summarized with the median, because even a single outlier can strongly influence the mean.
For example, if five patients are followed after sinus surgery for 10, 12, 15, 16, and 48 months, the mean duration of follow-up is 20 months, but the median is only 15 months. In this case a single outlier, 48 months, distorts the mean.

A special form of numerical data is called censored (Table 7.5). Data are censored when (a) the study direction is prospective, (b) the outcome is time related, and (c) some subjects die, are lost, or have not yet had the outcome when the study ends. Interpreting censored data is called survival analysis, because of its use in cancer studies where survival is the outcome of interest (17). For example, a study might report median survival time (by groups, if applicable) and the percent surviving at fixed time periods (e.g., 1, 5, 10 years). Survival analysis permits full utilization of censored observations (e.g., patients with less than 1 year of followup), by including them in the analysis up to the time the censoring occurred. Results of cancer studies are often reported with Kaplan-Meier curves, which may describe overall survival, disease-free survival, disease-specific survival, or progression-free survival (18). Survival data at the far right of the curves should be interpreted cautiously because fewer patients remain yielding less precise estimates.

Nominal and dichotomous data (Table 7.5) are best described using ratios, proportions, and rates. A ratio is the value obtained by dividing one quantity by another, both of which are separate and distinct. In a tonsillitis treatment study, for example, the ratio of children with clinical resolution after 10 days to those remaining symptomatic might be 80/20 or 4:1. In contrast, a proportion is a type of ratio in which the numerator is included in the denominator. In the previously mentioned study, the proportion with clinical resolution would be 80/100 or 0.80. Alternatively, this could be multiplied by 100 and expressed as a percentage (80%). Rates are similar to proportions except that a multiplier is used (e.g., 1,000 or 100,000) and they are computed over time. For example, a study might report a rate of 110 physician office visits per 100 children per year for upper respiratory infections.








TABLE 7.6 DESCRIPTIVE STATISTICS
















































































Descriptive Measure


Definition


When to Use It


Central tendency



Mean


Arithmetic average


Numerical data that are symmetric



Median


Middle observation; half the values are smaller and half are larger


Ordinal data; numerical data with an asymmetric distribution



Mode(s)


Most frequent value(s)


Nominal data; bimodal distribution


Dispersion



Range


Largest value minus smallest value


Numerical data without outliers



SD


Spread of data about their mean


Numerical data that are symmetric



95% reference range


Mean ± 1.96 SD


Numerical data that are symmetric



Percentile


Percentage of values that are equal to or below that number


Ordinal data; numerical data with an asymmetric distribution



Interquartile range


Difference between the 25th and 75th percentiles; contains 50% of data


Ordinal data; numerical data with an asymmetric distribution


Outcome



Survival rate


Proportion of subjects surviving, or with some other outcome, after a time interval (1, 5 y, etc.)


Numerical (censored) data in a prospective study; can be overall, cause specific, or progression free



Odds ratio


Odds of a disease or outcome in subjects with a risk factor divided by odds in controls


Dichotomous data in a retrospective or prospective controlled study



Relative risk


Incidence of a disease or outcome in subjects with a risk factor divided by incidence in controls


Dichotomous data in a prospective controlled study



Rate differencea


Event rate in treatment group minus event rate in control group


Compares success or failure rates in clinical trial groups



Correlation coefficient


Degree to which two variables have a linear relationship


Numerical or ordinal data


a Also called the absolute risk reduction.


May 24, 2016 | Posted by in OTOLARYNGOLOGY | Comments Off on Understanding Data and Interpreting the Literature

Full access? Get Clinical Tree

Get Clinical Tree app for offline access