Describing Central Tendency and Dispersion
Describing results begins by defining the
measurement scale that best suits the observations. Categorical (qualitative) observations fall into one or more categories, and include dichotomous, nominal, and ordinal scales (
Table 7.5). Numerical (quantitative) observations are measured on a continuous scale, and are further classified with a graphic display to assess distribution (histogram, stem-leaf plot, or frequency distribution curve) (
16). Numerical data with a symmetric (normal or Gaussian) distribution are evenly placed around a central crest or trough (bell-shaped curve). Numerical data with an asymmetric distribution are skewed (shifted) to one side of the center or contain unusually high or low outlier values. Skewed data can sometimes be normalized with a transformation (e.g., logarithmic).
When summarizing numerical data, the descriptive method varies according to the underlying distribution. Numerical data with a symmetric distribution are best summarized with the mean (
Table 7.6) and standard deviation (SD), because 68.3% of the observations fall within the mean ±1 SD, 95.4% within the mean ±2 SD, and 99.7% within the mean ±3 SD. In contrast, asymmetric numerical data are best summarized with the median, because even a single outlier can strongly influence the mean.
For example, if five patients are followed after sinus surgery for 10, 12, 15, 16, and 48 months, the mean duration of follow-up is 20 months, but the median is only 15 months. In this case a single outlier, 48 months, distorts the mean.
A special form of numerical data is called
censored (
Table 7.5). Data are censored when (a) the study direction is prospective, (b) the outcome is time related, and (c) some subjects die, are lost, or have not yet had the outcome when the study ends. Interpreting censored data is called
survival analysis, because of its use in cancer studies where survival is the outcome of interest (
17). For example, a study might report median survival time (by groups, if applicable) and the percent surviving at fixed time periods (e.g., 1, 5, 10 years). Survival analysis permits full utilization of censored observations (e.g., patients with less than 1 year of followup), by including them in the analysis up to the time the censoring occurred. Results of cancer studies are often reported with
Kaplan-Meier curves, which may describe overall survival, disease-free survival, disease-specific survival, or progression-free survival (
18). Survival data at the far right of the curves should be interpreted cautiously because fewer patients remain yielding less precise estimates.
Nominal and dichotomous data (
Table 7.5) are best described using ratios, proportions, and rates. A ratio is the value obtained by dividing one quantity by another, both of which are separate and distinct. In a tonsillitis treatment study, for example, the ratio of children with clinical resolution after 10 days to those remaining symptomatic might be 80/20 or 4:1. In contrast, a proportion is a type of ratio in which the numerator is included in the denominator. In the previously mentioned study, the proportion with clinical resolution would be 80/100 or 0.80. Alternatively, this could be multiplied by 100 and expressed as a percentage (80%). Rates are similar to proportions except that a multiplier is used (e.g., 1,000 or 100,000) and they are computed over time. For example, a study might report a rate of 110 physician office visits per 100 children per year for upper respiratory infections.