Understanding Data and Interpreting the Literature

Understanding Data and Interpreting the Literature

Richard M. Rosenfeld

“It is a capital mistake to theorize before one has data,” observed Sherlock Holmes. Physicians are rarely guilty of this offense, because they are inundated with data from clinical observations, journal articles, professional meetings, and governmental agencies. Unfortunately, the theories that result often have little relevance to the data that produced them. This is not unexpected, because physicians rarely devote the same efforts to mastering data that they devote to mastering clinical skills. As a partial remedy, this chapter offers a principle-centered approach to data analysis with emphasis on interpreting the medical literature (1,2).

Nearly all data relevant to clinical care comes from articles in peer-reviewed medical journals, whose content is generally assessed using principles that mirror those in this chapter (3).Unfortunately, the process of editorial peer review is largely untested and its effects are uncertain. Manuscript assessment, like even the most sophisticated diagnostic tests, has a certain sensitivity and specificity; worthy articles may be unappreciated (and unpublished) or worthless ones may pass undetected to the printing press. Peer reviewers may be biased, unqualified, or possess widely discrepant opinions about a study. The bottom line? Caveat lector: beware of what you read even in excellent medical journals.

Of more concern to clinicians than the inadequacies of peer review, however, is that the medical literature generally serves science rather than medical practice. Peer-reviewed publications facilitate communication from scientist to scientist, not necessarily from scientist to clinician. Most published studies are nondefinitive tests of hypotheses and innovations, only a very small percentage of which may warrant routine clinical application. Whereas the science may be sound, the idea has not progressed beyond the laboratory or preliminary field studies. Definitive studies constituting true scientist to clinician communication are rare in medical journals, and must be identified by critical appraisal.

Clinicians can use the medical literature to support clinical decisions in two complimentary ways: regular surveillance (or browsing) and problem-oriented searches. While the latter mode is more effective for learning, both are necessary for continuing clinical competence. Both methods require appreciating the purposes of the medical literature and understanding of the strengths and weaknesses of various study designs for providing valid and clinically applicable information.


Articles worthy of in-depth analysis have enticing titles and abstracts, which espouse innovative, controversial, or clinically relevant ideas. Reading the abstract alone, however, is a poor substitute for perusing the entire article. Whereas abstracts in otolaryngology journals generally do convey information about study design, sample size, and the source of the data, they infrequently describe adverse events, study limitations, and dropouts or losses (4). Abstracts may also report selective, incomplete, or inaccurate data leading to biased conclusions. Abstracts are a starting point for analysis, not the finish line.

Worthy articles appear as original research in the main section of a peer-reviewed journal. Methods and results sections, which represent the heart and soul of the article, should be appropriately detailed and lengthy. Statistical reporting includes confidence limits and measures of clinical benefit (5). Enough details should be provided for you to reproduce the study on your own, if desired, with a reasonable chance of obtaining the same results. A quick review of the paper in general should disclose many of the signs of grandeur in Table 7.1.



Signs of Grandeur

Signs of Decadence


Structured summary of goals, methods, results, and significance

Unstructured qualitative overview of study; contains more wish than reality


Clear, concise, and logical; ends with study rationale or purpose

Rambling, verbose literature review; no critical argument or hypothesis


Specific enough for the reader to reproduce the study and understand how quantitative results were generated

Vague or incomplete description of subjects, sampling, outcome criteria; no mention of statistical analysis


Logical blend of narrative and numbers, including 95% CIs, with supporting tables and figures

Difficult to read, with overuse or underuse of statistical tests; emphasis on P-values, not clinical importance


Puts main results in context; reviews supporting and conflicting literature; discusses strengths and weaknesses

Full of fantasy and speculation; rambling and biased literature review; does not acknowledge weaknesses


Demonstrates clearly that work of others has been systematically considered; emphasizes original research from peer-reviewed journals

Key articles are conspicuously absent; excessively brief; emphasizes review articles, book chapters, and lower-quality journals

Articles unworthy of in-depth analysis may have enticing titles and abstracts, but have no ability to support the lofty claims and conclusions therein. The article may appear in a non-peer-reviewed (throwaway) journal or in an industry-funded supplement to the main section (which generally implies lower quality). Signs of decadence (Table 7.1) are readily apparent when perusing the article’s main sections. The methods and results sections are vague and sparse, overshadowed by a verbose discussion section with unsupported opinions and creative misinterpretations. Don’t waste any time analyzing an unworthy article unless the premise is so novel and important that it overshadows the obvious weaknesses.


As expert clinicians consider something abnormal until they examine it and prove otherwise, a connoisseur of medical evidence considers a data set or a journal article to be laden with flaws, distortions, and omissions until proven to the contrary. The five basic questions in Table 7.2 are the key to this analytic process. Each question is discussed below using established principles of data analysis and literature interpretation (1).

Question 1: How Was the Study Performed

Study Design

Medical data arise from a research study, defined as an “organized quest for new knowledge, based on curiosity or perceived needs” (6). Validity of the data is determined in large part by the study design (specific procedures and methods) used by the investigators to produce their data. The study design must fit the research question. Despite the befuddling array of study designs espoused in the epidemiologic literature, the savvy data analyst need only address a few basic considerations (Table 7.3). These relate to (a) how the data were gathered, (b) what degree of control the investigator had over study conditions, (c) whether a control or comparison group was used, and (d) what direction of inquiry was followed.

Data collected specifically for research (Table 7.3) are likely to be unbiased—they reflect the true value of the attribute being measured. In contrast, data collected during routine clinical care will vary in quality. Experimental studies, such as randomized trials, are likely to produce highquality data because they are performed under carefully controlled conditions. In observational studies, however, the investigator is simply a bystander who records the natural course of health events during clinical care. Regardless of data quality bias may be introduced at any stage of the research process, which includes reviewing literature, defining baseline states, performing interventions, measuring outcomes, analyzing data, and publishing results (7).

The presence or absence of a control group has a profound influence on data interpretation. An uncontrolled study—no matter how elegant—is purely descriptive (2). Case series, which appear frequently in the otolaryngology literature, cannot assess efficacy or effectiveness, but they can convey feasibility, experience, technical details of an intervention, and predictive factors associated with good outcomes or adverse events (8). The best case series use a consecutive sample of patients, adjust for interfering

variables, plan in advance for systematic data collection, and are humble and cautious when interpreting results.



Why It Is Important

Underlying Principles

1. What type of study produced the data?

Study design has a profound impact on interpretation; scrutinize the data collection, degree of investigator control, use of control groups, and direction of inquiry

Bias, research design, placebo effect, control groups, causality

2. What are the results?

Results should be summarized with appropriate descriptive statistics; positive results must be qualified by the chance of being wrong, and negative results by the chance of having missed a true difference

Measurement scale, association, P value, power, effect size, clinical importance

3. Are the results valid within the study?

Proper statistical analysis and data collection ensures valid results for the subjects studied; measurements must be accurate and reproducible

Internal validity, accuracy, statistical tests

4. Are the results valid outside the study?

Results can be generalized when the sampling method is sound, subjects are representative of the target population, and sample size is large enough for adequate precision

External validity, sampling, CIs, precision

5. Are the results strong and consistent?

A single study is rarely definitive; results must be viewed relative to their plausibility, consistency with past efforts, and by the strength of the study methodology

Research integration, level of evidence, systematic review


Aspect of Study Design

Effect on Data Interpretation

How were the data originally collected?

Specifically for research

Interpretation is facilitated by quality data collected according to an a priori protocol

During routine clinical care

Interpretation is limited by the consistency, accuracy, availability, and completeness of the source records

Database or data registry

Interpretation is limited by representativeness of the sample and the quality and completeness of data fields

Is the study experimental or observational?

Experimental study with conditions under direct control of the investigator

Low potential for systematic error (bias); bias can be reduced further by randomization and masking (blinding)

Observational study without intervention other than to record, classify, analyze

High potential for bias in sample selection, treatment assignment, measurement of exposures, and outcomes

Is there a comparison or control group?

Comparative or controlled study with two or more groups

Permits analytic statements concerning efficacy, effectiveness, and association

No comparison group present

Permits descriptive statements only, because of improvements from natural history and placebo effect

What is the direction of study inquiry?

Subjects identified prior to an outcome or disease; future events recorded

Prospective design measures incidence (new events) and causality (if comparison group included)

Subjects identified after an outcome or disease; past histories are examined

Retrospective design measures prevalence (existing events) and causality (if comparison group included)

Subjects are identified at a single time point, regardless of outcome or disease

Cross-sectional design measures prevalence (existing events) and association (if comparison group included)






Systematic variation of measurements from their true values; may be intentional or unintentional

Accurate, protocol-driven data collection


Random variation without apparent relation to other measurements or variables (e.g., getting lucky)

Control or comparison group

Natural history

Course of a disease from onset to resolution; may include relapse, remission, and spontaneous recovery

Control or comparison group

Regression to the mean

Symptom improvement independent of therapy, as sick patients return to a mean level after seeking care

Control or comparison group

Placebo effect

Beneficial effect caused by the expectation that the regimen will have an effect (e.g., power of suggestion)

Control or comparison group with placebo

Halo effect

Beneficial effect caused by the manner, attention, and caring of a provider during a medical encounter

Control or comparison group treated similarly


Distortion of an effect by other prognostic factors or variables for which adjustments have not been made

Randomization or multivariate analysis

Allocation (susceptibility) bias

Beneficial effect caused by allocating subjects with less severe disease or better prognosis to treatment group

Randomization or comorbidity analysis

Ascertainment (detection) bias

Favoring the treatment group during outcome analysis (e.g., rounding up for treated subjects, down for controls)

Masked (blinded) outcome assessment

Without a control or comparison group, treatment effects cannot be distinguished from other causes of clinical change (Table 7.4) (9). Some of these causes are found in Figure 7.1, which depicts change in health status after a healing encounter as a complex interaction of three primary factors:

  • What was actually done. Specific effect(s) of therapy, including medications, surgery, physical manipulations, and alternative or integrative approaches.

  • What would have happened anyway. Spontaneous resolution, including natural history, random fluctuations in disease status, and regression to a mean symptom state.

  • What was imagined to be done. Placebo response, defined as a change in health status resulting from the symbolic significance attributed by the patient (or proxy) to the encounter itself (10).

A placebo response is most likely to occur when the patient receives a meaningful and personalized explanation, feels care and concern expressed by the healer, and achieves control and mastery over illness (or believes that the healer can control the illness).The placebo response differs from the traditional definition of placebo as an inactive medical substance. Unlike the “placebo pills” in randomized trials, a placebo response can be elicited by touch, words, gestures, local ambiance, and social interactions. A valid and reliable 12-item survey, the PR-12, is available to measure aspects of the placebo response in office encounters (11).

Figure 7.1 Model depicting change in health status after a healing encounter. Dashed arrow shows that a placebo response may occur from symbolic significance of the specific therapy given or from interpersonal aspects of the encounter.

Assessing Causality

When data from a comparison or control group are available, statistics may be used to test hypotheses and measure associations. Causality may also be assessed when the study has a time-span component, either retrospective or prospective (Table 7.3). Prospective studies measure incidence (new events) whereas retrospective studies measure prevalence (existing events). Unlike time-span studies, cross-sectional inquiries (surveys, screening programs, evaluations of diagnostic tests) measure association, not causality.

Efficacy and causality are best assessed by randomized controlled trials, because nonrandom treatment assignment is prone to innate distortions caused by individual judgments and other selective decisions (allocation bias). A dangerous habit, however, is to label all randomized trials as high quality and all observational studies (e.g., outcomes research) as substandard. Randomization cannot compensate for imprecise selection criteria, poorly defined endpoints, inadequate follow-up, or low compliance with treatment. Moreover, randomized trials with inadequate methodology tend to exaggerate treatment effects compared with trials that are properly designed and executed (12).

The best randomized trials ensure adequate randomization, conceal treatment allocation (blinding), and analyze results by intention-to-treat. The intention-totreat analysis maintains treatment groups that are similar apart from random variation, which may not occur if only subjects who complied with treatment (on-treatment analysis) are included (13). A blinded (masked) trial is always superior to a nonblinded (open, open-label, or unmasked) trial in which everyone involved knows who received which interventions (14). In a double-blind trial the participants, investigators, and assessors all remain unaware of the intervention assignments. A triple-blind trial also maintains a blind data analysis, but some use this simply to indicate that the investigators and assessors are distinct.






Classification into either of two mutually exclusive categories

Breast feeding (yes/no), sex (male/female)


Classification into unordered qualitative categories

Race, religion, country of origin


Classification into ordered qualitative categories, but with no natural (numerical) distance between their possible values

hearing loss (none, mild, moderate), patient satisfaction (low, medium, high), age group


Measurements with a continuous scale, or a large number of discrete ordered values

Temperature, age in years, hearing level in decibels

Numerical (censored)

Measurements on subjects lost to follow-up or in whom a specified event has not yet occurred at the end of a study

Survival rate, recurrence rate, or any time-to-event outcome in a prospective study

Randomized controlled trials comprise only about 4% of articles in leading otolaryngology journals, with about 25% supported by industry funding (15). The presence of industry support, however, is unrelated to conclusions favoring intervention. Nearly 60% of articles use intentionto-treat analysis, but only a minority specify randomization schemes, employ a double-blind protocol, include confidence intervals (CIs), or explicitly discuss adverse events. The earlier advice of “caveat lector,” therefore, also applies to randomized trials, not just observational research.

Question 2: What Are the Results?

Describing Central Tendency and Dispersion

Describing results begins by defining the measurement scale that best suits the observations. Categorical (qualitative) observations fall into one or more categories, and include dichotomous, nominal, and ordinal scales (Table 7.5). Numerical (quantitative) observations are measured on a continuous scale, and are further classified with a graphic display to assess distribution (histogram, stem-leaf plot, or frequency distribution curve) (16). Numerical data with a symmetric (normal or Gaussian) distribution are evenly placed around a central crest or trough (bell-shaped curve). Numerical data with an asymmetric distribution are skewed (shifted) to one side of the center or contain unusually high or low outlier values. Skewed data can sometimes be normalized with a transformation (e.g., logarithmic).

When summarizing numerical data, the descriptive method varies according to the underlying distribution. Numerical data with a symmetric distribution are best summarized with the mean (Table 7.6) and standard deviation (SD), because 68.3% of the observations fall within the mean ±1 SD, 95.4% within the mean ±2 SD, and 99.7% within the mean ±3 SD. In contrast, asymmetric numerical data are best summarized with the median, because even a single outlier can strongly influence the mean.
For example, if five patients are followed after sinus surgery for 10, 12, 15, 16, and 48 months, the mean duration of follow-up is 20 months, but the median is only 15 months. In this case a single outlier, 48 months, distorts the mean.

A special form of numerical data is called censored (Table 7.5). Data are censored when (a) the study direction is prospective, (b) the outcome is time related, and (c) some subjects die, are lost, or have not yet had the outcome when the study ends. Interpreting censored data is called survival analysis, because of its use in cancer studies where survival is the outcome of interest (17). For example, a study might report median survival time (by groups, if applicable) and the percent surviving at fixed time periods (e.g., 1, 5, 10 years). Survival analysis permits full utilization of censored observations (e.g., patients with less than 1 year of followup), by including them in the analysis up to the time the censoring occurred. Results of cancer studies are often reported with Kaplan-Meier curves, which may describe overall survival, disease-free survival, disease-specific survival, or progression-free survival (18). Survival data at the far right of the curves should be interpreted cautiously because fewer patients remain yielding less precise estimates.

Nominal and dichotomous data (Table 7.5) are best described using ratios, proportions, and rates. A ratio is the value obtained by dividing one quantity by another, both of which are separate and distinct. In a tonsillitis treatment study, for example, the ratio of children with clinical resolution after 10 days to those remaining symptomatic might be 80/20 or 4:1. In contrast, a proportion is a type of ratio in which the numerator is included in the denominator. In the previously mentioned study, the proportion with clinical resolution would be 80/100 or 0.80. Alternatively, this could be multiplied by 100 and expressed as a percentage (80%). Rates are similar to proportions except that a multiplier is used (e.g., 1,000 or 100,000) and they are computed over time. For example, a study might report a rate of 110 physician office visits per 100 children per year for upper respiratory infections.


Descriptive Measure


When to Use It

Central tendency


Arithmetic average

Numerical data that are symmetric


Middle observation; half the values are smaller and half are larger

Ordinal data; numerical data with an asymmetric distribution


Most frequent value(s)

Nominal data; bimodal distribution



Largest value minus smallest value

Numerical data without outliers


Spread of data about their mean

Numerical data that are symmetric

95% reference range

Mean ± 1.96 SD

Numerical data that are symmetric


Percentage of values that are equal to or below that number

Ordinal data; numerical data with an asymmetric distribution

Interquartile range

Difference between the 25th and 75th percentiles; contains 50% of data

Ordinal data; numerical data with an asymmetric distribution


Survival rate

Proportion of subjects surviving, or with some other outcome, after a time interval (1, 5 y, etc.)

Numerical (censored) data in a prospective study; can be overall, cause specific, or progression free

Odds ratio

Odds of a disease or outcome in subjects with a risk factor divided by odds in controls

Dichotomous data in a retrospective or prospective controlled study

Relative risk

Incidence of a disease or outcome in subjects with a risk factor divided by incidence in controls

Dichotomous data in a prospective controlled study

Rate differencea

Event rate in treatment group minus event rate in control group

Compares success or failure rates in clinical trial groups

Correlation coefficient

Degree to which two variables have a linear relationship

Numerical or ordinal data

a Also called the absolute risk reduction.

May 24, 2016 | Posted by in OTOLARYNGOLOGY | Comments Off on Understanding Data and Interpreting the Literature
Premium Wordpress Themes by UFO Themes