Understanding Data and Interpreting the Literature

Richard M. Rosenfeld

“It is a capital mistake to theorize before one has data,” observed Sherlock Holmes. Physicians are rarely guilty of this offense, because they are inundated with data from clinical observations, journal articles, professional meetings, and governmental agencies. Unfortunately, the theories that result often have little relevance to the data that produced them. This is not unexpected, because physicians rarely devote the same efforts to mastering data that they devote to mastering clinical skills. As a partial remedy, this chapter offers a principle-centered approach to data analysis with emphasis on interpreting the medical literature (1,2).

Nearly all data relevant to clinical care comes from articles in peer-reviewed medical journals, whose content is generally assessed using principles that mirror those in this chapter (3).Unfortunately, the process of editorial peer review is largely untested and its effects are uncertain. Manuscript assessment, like even the most sophisticated diagnostic tests, has a certain sensitivity and specificity; worthy articles may be unappreciated (and unpublished) or worthless ones may pass undetected to the printing press. Peer reviewers may be biased, unqualified, or possess widely discrepant opinions about a study. The bottom line? Caveat lector: beware of what you read even in excellent medical journals.

Of more concern to clinicians than the inadequacies of peer review, however, is that the medical literature generally serves science rather than medical practice. Peer-reviewed publications facilitate communication from scientist to scientist, not necessarily from scientist to clinician. Most published studies are nondefinitive tests of hypotheses and innovations, only a very small percentage of which may warrant routine clinical application. Whereas the science may be sound, the idea has not progressed beyond the laboratory or preliminary field studies. Definitive studies constituting true scientist to clinician communication are rare in medical journals, and must be identified by critical appraisal.

Clinicians can use the medical literature to support clinical decisions in two complimentary ways: regular surveillance (or browsing) and problem-oriented searches. While the latter mode is more effective for learning, both are necessary for continuing clinical competence. Both methods require appreciating the purposes of the medical literature and understanding of the strengths and weaknesses of various study designs for providing valid and clinically applicable information.

HOW TO IDENTIFY ARTICLES WORTH READING

Articles worthy of in-depth analysis have enticing titles and abstracts, which espouse innovative, controversial, or clinically relevant ideas. Reading the abstract alone, however, is a poor substitute for perusing the entire article. Whereas abstracts in otolaryngology journals generally do convey information about study design, sample size, and the source of the data, they infrequently describe adverse events, study limitations, and dropouts or losses (4). Abstracts may also report selective, incomplete, or inaccurate data leading to biased conclusions. Abstracts are a starting point for analysis, not the finish line.

Worthy articles appear as original research in the main section of a peer-reviewed journal. Methods and results sections, which represent the heart and soul of the article, should be appropriately detailed and lengthy. Statistical reporting includes confidence limits and measures of clinical benefit (5). Enough details should be provided for you to reproduce the study on your own, if desired, with a reasonable chance of obtaining the same results. A quick review of the paper in general should disclose many of the signs of grandeur in Table 7.1.

TABLE 7.1 SIGNS OF GRANDEUR AND DECADENCE IN JOURNAL ARTICLES

Section	Signs of Grandeur	Signs of Decadence
Abstract	Structured summary of goals, methods, results, and significance	Unstructured qualitative overview of study; contains more wish than reality
Introduction	Clear, concise, and logical; ends with study rationale or purpose	Rambling, verbose literature review; no critical argument or hypothesis
Methods	Specific enough for the reader to reproduce the study and understand how quantitative results were generated	Vague or incomplete description of subjects, sampling, outcome criteria; no mention of statistical analysis
Results	Logical blend of narrative and numbers, including 95% CIs, with supporting tables and figures	Difficult to read, with overuse or underuse of statistical tests; emphasis on P-values, not clinical importance
Discussion	Puts main results in context; reviews supporting and conflicting literature; discusses strengths and weaknesses	Full of fantasy and speculation; rambling and biased literature review; does not acknowledge weaknesses
References	Demonstrates clearly that work of others has been systematically considered; emphasizes original research from peer-reviewed journals	Key articles are conspicuously absent; excessively brief; emphasizes review articles, book chapters, and lower-quality journals

Articles unworthy of in-depth analysis may have enticing titles and abstracts, but have no ability to support the lofty claims and conclusions therein. The article may appear in a non-peer-reviewed (throwaway) journal or in an industry-funded supplement to the main section (which generally implies lower quality). Signs of decadence (Table 7.1) are readily apparent when perusing the article’s main sections. The methods and results sections are vague and sparse, overshadowed by a verbose discussion section with unsupported opinions and creative misinterpretations. Don’t waste any time analyzing an unworthy article unless the premise is so novel and important that it overshadows the obvious weaknesses.

FIVE BASIC QUESTIONS FOR INTERPRETING DATA

As expert clinicians consider something abnormal until they examine it and prove otherwise, a connoisseur of medical evidence considers a data set or a journal article to be laden with flaws, distortions, and omissions until proven to the contrary. The five basic questions in Table 7.2 are the key to this analytic process. Each question is discussed below using established principles of data analysis and literature interpretation (1).

Question 1: How Was the Study Performed

Study Design

Medical data arise from a research study, defined as an “organized quest for new knowledge, based on curiosity or perceived needs” (6). Validity of the data is determined in large part by the study design (specific procedures and methods) used by the investigators to produce their data. The study design must fit the research question. Despite the befuddling array of study designs espoused in the epidemiologic literature, the savvy data analyst need only address a few basic considerations (Table 7.3). These relate to (a) how the data were gathered, (b) what degree of control the investigator had over study conditions, (c) whether a control or comparison group was used, and (d) what direction of inquiry was followed.

Data collected specifically for research (Table 7.3) are likely to be unbiased—they reflect the true value of the attribute being measured. In contrast, data collected during routine clinical care will vary in quality. Experimental studies, such as randomized trials, are likely to produce highquality data because they are performed under carefully controlled conditions. In observational studies, however, the investigator is simply a bystander who records the natural course of health events during clinical care. Regardless of data quality bias may be introduced at any stage of the research process, which includes reviewing literature, defining baseline states, performing interventions, measuring outcomes, analyzing data, and publishing results (7).

The presence or absence of a control group has a profound influence on data interpretation. An uncontrolled study—no matter how elegant—is purely descriptive (2). Case series, which appear frequently in the otolaryngology literature, cannot assess efficacy or effectiveness, but they can convey feasibility, experience, technical details of an intervention, and predictive factors associated with good outcomes or adverse events (8). The best case series use a consecutive sample of patients, adjust for interfering

variables, plan in advance for systematic data collection, and are humble and cautious when interpreting results.

TABLE 7.2 FIVE BASIC QUESTIONS FOR INTERPRETING MEDICAL DATA

Question	Why It Is Important	Underlying Principles
1. What type of study produced the data?	Study design has a profound impact on interpretation; scrutinize the data collection, degree of investigator control, use of control groups, and direction of inquiry	Bias, research design, placebo effect, control groups, causality
2. What are the results?	Results should be summarized with appropriate descriptive statistics; positive results must be qualified by the chance of being wrong, and negative results by the chance of having missed a true difference	Measurement scale, association, P value, power, effect size, clinical importance
3. Are the results valid within the study?	Proper statistical analysis and data collection ensures valid results for the subjects studied; measurements must be accurate and reproducible	Internal validity, accuracy, statistical tests
4. Are the results valid outside the study?	Results can be generalized when the sampling method is sound, subjects are representative of the target population, and sample size is large enough for adequate precision	External validity, sampling, CIs, precision
5. Are the results strong and consistent?	A single study is rarely definitive; results must be viewed relative to their plausibility, consistency with past efforts, and by the strength of the study methodology	Research integration, level of evidence, systematic review

TABLE 7.3 EFFECT OF STUDY DESIGN ON DATA INTERPRETATION

Aspect of Study Design			Effect on Data Interpretation
How were the data originally collected?
	Specifically for research		Interpretation is facilitated by quality data collected according to an a priori protocol
	During routine clinical care		Interpretation is limited by the consistency, accuracy, availability, and completeness of the source records
	Database or data registry		Interpretation is limited by representativeness of the sample and the quality and completeness of data fields
Is the study experimental or observational?
	Experimental study with conditions under direct control of the investigator		Low potential for systematic error (bias); bias can be reduced further by randomization and masking (blinding)
		Observational study without intervention other than to record, classify, analyze	High potential for bias in sample selection, treatment assignment, measurement of exposures, and outcomes
Is there a comparison or control group?
	Comparative or controlled study with two or more groups		Permits analytic statements concerning efficacy, effectiveness, and association
	No comparison group present		Permits descriptive statements only, because of improvements from natural history and placebo effect
What is the direction of study inquiry?
	Subjects identified prior to an outcome or disease; future events recorded		Prospective design measures incidence (new events) and causality (if comparison group included)
	Subjects identified after an outcome or disease; past histories are examined		Retrospective design measures prevalence (existing events) and causality (if comparison group included)
	Subjects are identified at a single time point, regardless of outcome or disease		Cross-sectional design measures prevalence (existing events) and association (if comparison group included)

TABLE 7.4 EXPLANATIONS FOR FAVORABLE OUTCOMES IN TREATMENT STUDIES

Explanation	Definition	Solution
Bias	Systematic variation of measurements from their true values; may be intentional or unintentional	Accurate, protocol-driven data collection
Chance	Random variation without apparent relation to other measurements or variables (e.g., getting lucky)	Control or comparison group
Natural history	Course of a disease from onset to resolution; may include relapse, remission, and spontaneous recovery	Control or comparison group
Regression to the mean	Symptom improvement independent of therapy, as sick patients return to a mean level after seeking care	Control or comparison group
Placebo effect	Beneficial effect caused by the expectation that the regimen will have an effect (e.g., power of suggestion)	Control or comparison group with placebo
Halo effect	Beneficial effect caused by the manner, attention, and caring of a provider during a medical encounter	Control or comparison group treated similarly
Confounding	Distortion of an effect by other prognostic factors or variables for which adjustments have not been made	Randomization or multivariate analysis
Allocation (susceptibility) bias	Beneficial effect caused by allocating subjects with less severe disease or better prognosis to treatment group	Randomization or comorbidity analysis
Ascertainment (detection) bias	Favoring the treatment group during outcome analysis (e.g., rounding up for treated subjects, down for controls)	Masked (blinded) outcome assessment

Without a control or comparison group, treatment effects cannot be distinguished from other causes of clinical change (Table 7.4) (9). Some of these causes are found in Figure 7.1, which depicts change in health status after a healing encounter as a complex interaction of three primary factors:

What was actually done. Specific effect(s) of therapy, including medications, surgery, physical manipulations, and alternative or integrative approaches.
What would have happened anyway. Spontaneous resolution, including natural history, random fluctuations in disease status, and regression to a mean symptom state.
What was imagined to be done. Placebo response, defined as a change in health status resulting from the symbolic significance attributed by the patient (or proxy) to the encounter itself (10).

A placebo response is most likely to occur when the patient receives a meaningful and personalized explanation, feels care and concern expressed by the healer, and achieves control and mastery over illness (or believes that the healer can control the illness).The placebo response differs from the traditional definition of placebo as an inactive medical substance. Unlike the “placebo pills” in randomized trials, a placebo response can be elicited by touch, words, gestures, local ambiance, and social interactions. A valid and reliable 12-item survey, the PR-12, is available to measure aspects of the placebo response in office encounters (11).

Figure 7.1 Model depicting change in health status after a healing encounter. Dashed arrow shows that a placebo response may occur from symbolic significance of the specific therapy given or from interpersonal aspects of the encounter.

Assessing Causality

When data from a comparison or control group are available, statistics may be used to test hypotheses and measure associations. Causality may also be assessed when the study has a time-span component, either retrospective or prospective (Table 7.3). Prospective studies measure incidence (new events) whereas retrospective studies measure prevalence (existing events). Unlike time-span studies, cross-sectional inquiries (surveys, screening programs, evaluations of diagnostic tests) measure association, not causality.

Efficacy and causality are best assessed by randomized controlled trials, because nonrandom treatment assignment is prone to innate distortions caused by individual judgments and other selective decisions (allocation bias). A dangerous habit, however, is to label all randomized trials as high quality and all observational studies (e.g., outcomes research) as substandard. Randomization cannot compensate for imprecise selection criteria, poorly defined endpoints, inadequate follow-up, or low compliance with treatment. Moreover, randomized trials with inadequate methodology tend to exaggerate treatment effects compared with trials that are properly designed and executed (12).

The best randomized trials ensure adequate randomization, conceal treatment allocation (blinding), and analyze results by intention-to-treat. The intention-totreat analysis maintains treatment groups that are similar apart from random variation, which may not occur if only subjects who complied with treatment (on-treatment analysis) are included (13). A blinded (masked) trial is always superior to a nonblinded (open, open-label, or unmasked) trial in which everyone involved knows who received which interventions (14). In a double-blind trial the participants, investigators, and assessors all remain unaware of the intervention assignments. A triple-blind trial also maintains a blind data analysis, but some use this simply to indicate that the investigators and assessors are distinct.

TABLE 7.5 MEASUREMENT SCALES FOR DESCRIBING AND ANALYZING DATA

Scale	Definition	Examples
Dichotomous	Classification into either of two mutually exclusive categories	Breast feeding (yes/no), sex (male/female)
Nominal	Classification into unordered qualitative categories	Race, religion, country of origin
Ordinal	Classification into ordered qualitative categories, but with no natural (numerical) distance between their possible values	hearing loss (none, mild, moderate), patient satisfaction (low, medium, high), age group
Numerical	Measurements with a continuous scale, or a large number of discrete ordered values	Temperature, age in years, hearing level in decibels
Numerical (censored)	Measurements on subjects lost to follow-up or in whom a specified event has not yet occurred at the end of a study	Survival rate, recurrence rate, or any time-to-event outcome in a prospective study

Randomized controlled trials comprise only about 4% of articles in leading otolaryngology journals, with about 25% supported by industry funding (15). The presence of industry support, however, is unrelated to conclusions favoring intervention. Nearly 60% of articles use intentionto-treat analysis, but only a minority specify randomization schemes, employ a double-blind protocol, include confidence intervals (CIs), or explicitly discuss adverse events. The earlier advice of “caveat lector,” therefore, also applies to randomized trials, not just observational research.

Question 2: What Are the Results?

Describing Central Tendency and Dispersion

Describing results begins by defining the measurement scale that best suits the observations. Categorical (qualitative) observations fall into one or more categories, and include dichotomous, nominal, and ordinal scales (Table 7.5). Numerical (quantitative) observations are measured on a continuous scale, and are further classified with a graphic display to assess distribution (histogram, stem-leaf plot, or frequency distribution curve) (16). Numerical data with a symmetric (normal or Gaussian) distribution are evenly placed around a central crest or trough (bell-shaped curve). Numerical data with an asymmetric distribution are skewed (shifted) to one side of the center or contain unusually high or low outlier values. Skewed data can sometimes be normalized with a transformation (e.g., logarithmic).

When summarizing numerical data, the descriptive method varies according to the underlying distribution. Numerical data with a symmetric distribution are best summarized with the mean (Table 7.6) and standard deviation (SD), because 68.3% of the observations fall within the mean ±1 SD, 95.4% within the mean ±2 SD, and 99.7% within the mean ±3 SD. In contrast, asymmetric numerical data are best summarized with the median, because even a single outlier can strongly influence the mean.
For example, if five patients are followed after sinus surgery for 10, 12, 15, 16, and 48 months, the mean duration of follow-up is 20 months, but the median is only 15 months. In this case a single outlier, 48 months, distorts the mean.

A special form of numerical data is called censored (Table 7.5). Data are censored when (a) the study direction is prospective, (b) the outcome is time related, and (c) some subjects die, are lost, or have not yet had the outcome when the study ends. Interpreting censored data is called survival analysis, because of its use in cancer studies where survival is the outcome of interest (17). For example, a study might report median survival time (by groups, if applicable) and the percent surviving at fixed time periods (e.g., 1, 5, 10 years). Survival analysis permits full utilization of censored observations (e.g., patients with less than 1 year of followup), by including them in the analysis up to the time the censoring occurred. Results of cancer studies are often reported with Kaplan-Meier curves, which may describe overall survival, disease-free survival, disease-specific survival, or progression-free survival (18). Survival data at the far right of the curves should be interpreted cautiously because fewer patients remain yielding less precise estimates.

Nominal and dichotomous data (Table 7.5) are best described using ratios, proportions, and rates. A ratio is the value obtained by dividing one quantity by another, both of which are separate and distinct. In a tonsillitis treatment study, for example, the ratio of children with clinical resolution after 10 days to those remaining symptomatic might be 80/20 or 4:1. In contrast, a proportion is a type of ratio in which the numerator is included in the denominator. In the previously mentioned study, the proportion with clinical resolution would be 80/100 or 0.80. Alternatively, this could be multiplied by 100 and expressed as a percentage (80%). Rates are similar to proportions except that a multiplier is used (e.g., 1,000 or 100,000) and they are computed over time. For example, a study might report a rate of 110 physician office visits per 100 children per year for upper respiratory infections.

TABLE 7.6 DESCRIPTIVE STATISTICS

Descriptive Measure		Definition	When to Use It
Central tendency
	Mean	Arithmetic average	Numerical data that are symmetric
	Median	Middle observation; half the values are smaller and half are larger	Ordinal data; numerical data with an asymmetric distribution
	Mode(s)	Most frequent value(s)	Nominal data; bimodal distribution
Dispersion
	Range	Largest value minus smallest value	Numerical data without outliers
	SD	Spread of data about their mean	Numerical data that are symmetric
	95% reference range	Mean ± 1.96 SD	Numerical data that are symmetric
	Percentile	Percentage of values that are equal to or below that number	Ordinal data; numerical data with an asymmetric distribution
	Interquartile range	Difference between the 25th and 75th percentiles; contains 50% of data	Ordinal data; numerical data with an asymmetric distribution
Outcome
	Survival rate	Proportion of subjects surviving, or with some other outcome, after a time interval (1, 5 y, etc.)	Numerical (censored) data in a prospective study; can be overall, cause specific, or progression free
	Odds ratio	Odds of a disease or outcome in subjects with a risk factor divided by odds in controls	Dichotomous data in a retrospective or prospective controlled study
	Relative risk	Incidence of a disease or outcome in subjects with a risk factor divided by incidence in controls	Dichotomous data in a prospective controlled study
	Rate difference^a	Event rate in treatment group minus event rate in control group	Compares success or failure rates in clinical trial groups
	Correlation coefficient	Degree to which two variables have a linear relationship	Numerical or ordinal data
^aAlso called the absolute risk reduction.

Tags: Bailey's Head and Neck Surgery: Otolaryngology

May 24, 2016 | Posted by drzezo in OTOLARYNGOLOGY | Comments Off

Ento Key

Fastest Otolaryngology & Ophthalmology Insight Engine

Understanding Data and Interpreting the Literature

Related

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

Ento Key

Fastest Otolaryngology & Ophthalmology Insight Engine

Understanding Data and Interpreting the Literature

Share this:

Related

Related posts:

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree