Distribution

  • Definition: Indicates what values a variable takes and how often it takes those values.
  • Summaries: Can be a table, graph, or function.
  • Visual Summaries:
    • Categorical Variables (Qualitative): Pie chart, bar chart.
    • Numerical Variables (Quantitative): Boxplot, histogram, dot-plots, stem-and-leaf plots.

Table

Categorical/Discrete Variable

Summarizes the frequency of different categories.

Breakdown CauseFrequency
Electrical9
Mechanical24
Misuse13
Total46

Continuous Variable

Frequency distribution: A tabulation of the frequencies.

ClassFrequencyRelative FrequencyCumulative Relative Frequency
70 ≤ x < 9020.02500.0250
90 ≤ x < 11030.03750.0625
110 ≤ x < 13060.07500.1375

Graphical Summary (Categorical Variable)

Bar Charts

  • Each category has a bar whose length is proportional to the frequency.
  • Quantitative Data: Bar represents mean values.
  • Qualitative Data: Bar represents frequencies.
  • Bars can be vertical or horizontal.

Example:

  • Breakdown causes: Electrical (9), Mechanical (24), Misuse (13).
  • Bar chart shows frequencies.

Pie Charts

  • A circle sliced into sectors proportional to frequencies.
  • Example: Types of cancers (Lung, Breast, Colon, etc.) with relative frequencies.

Descriptive Statistics

  • Steps: Collection, organization, classification, summarization, presentation.
  • Measures:
    • Central Tendency: Mean, median, mode.
    • Variation: Variance, standard deviation, range.
    • Position: Quartiles, interquartile range (IQR).

Measures of Central Tendency

  1. Mean:

    • Population: (\mu = \frac{\sum_{i=1}^N x_i}{N})
    • Sample: (\bar{x} = \frac{\sum_{i=1}^n x_i}{n})
    • Affected by extreme values.
  2. Median:

    • Middle value of ordered data.
    • For odd (n): (\text{Median} = x_{\frac{n+1}{2}})
    • For even (n): (\text{Median} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2})
    • Less affected by outliers.
  3. Mode:

    • Most frequent value.
    • Can be non-unique or non-existent.

Example: Data set: 1, 3, 5, 7, 7, 8.

  • Mean: 5.1667
  • Median: 6
  • Mode: 7

Measures of Variation

  1. Variance:

    • Population: (\sigma^2 = \frac{1}{N} \left( \sum x^2 - \frac{(\sum x)^2}{N} \right))
    • Sample: (s^2 = \frac{1}{n-1} \left( \sum x^2 - \frac{(\sum x)^2}{n} \right))
  2. Standard Deviation: Square root of variance.

  3. Range: (R = \text{highest value} - \text{lowest value}).

  4. Coefficient of Variation (CV):

    • (CV = \frac{\sigma}{\mu} \times 100%) (Population)
    • (CV = \frac{s}{\bar{x}} \times 100%) (Sample)
    • Higher CV → Less consistency, more dispersion.

Example: Data set: 9, 11, 12, …, 38.

  • Range: 29
  • Variance: 87.4333
  • Standard Deviation: 9.3506

Measures of Position

  1. Quartiles:
    • (Q_1): 25th percentile.
    • (Q_2): 50th percentile (Median).
    • (Q_3): 75th percentile.
    • IQR: (Q_3 - Q_1).
    • Quartile Deviation: (\frac{Q_3 - Q_1}{2}).

Example: Data set: 5, 6, 12, 13, 15, 18, 22, 28.

  • (Q_1 = 7.5), (Q_2 = 14), (Q_3 = 21).
  • IQR: 13.5
  • Quartile Deviation: 6.75

Graphical Summary (Numerical Variables)

Stem-and-Leaf Plot

  • Splits observations into stem (leading digits) and leaf (remaining digits).
  • Example: Percent of state population born outside the U.S.

Histogram

  • Divides data into class intervals and counts observations in each.
  • Example: Percent of foreign-born residents in states (class intervals of 5).

Boxplot

  • Displays five-number summary (Min, (Q_1), Median, (Q_3), Max).
  • Outliers: Values beyond (1.5 \times \text{IQR}) from (Q_1) or (Q_3).

Example: Lifetimes of high-voltage components (boxplot shows distribution and outliers).


Outliers

  • Definition: Data points not consistent with the bulk of the data.
  • Reasons:
    • Measurement errors.
    • Belonging to a different group.
    • Natural variability.
  • Influence: Can affect mean and other statistics.

Example: Marks of students (70, 72, 74, 76, 78 vs. 35, 72, 74, 76, 78).


Summary

Two Approaches for Numerical Variables

AspectApproach 1Approach 2
LocationMedianMean
SpreadRange or IQRStandard Deviation
SummaryFive-number Summary
VisualizationBoxplotHistogram