Distribution
- Definition: Indicates what values a variable takes and how often it takes those values.
- Summaries: Can be a table, graph, or function.
- Visual Summaries:
- Categorical Variables (Qualitative): Pie chart, bar chart.
- Numerical Variables (Quantitative): Boxplot, histogram, dot-plots, stem-and-leaf plots.
Table
Categorical/Discrete Variable
Summarizes the frequency of different categories.
| Breakdown Cause | Frequency |
|---|---|
| Electrical | 9 |
| Mechanical | 24 |
| Misuse | 13 |
| Total | 46 |
Continuous Variable
Frequency distribution: A tabulation of the frequencies.
| Class | Frequency | Relative Frequency | Cumulative Relative Frequency |
|---|---|---|---|
| 70 ≤ x < 90 | 2 | 0.0250 | 0.0250 |
| 90 ≤ x < 110 | 3 | 0.0375 | 0.0625 |
| 110 ≤ x < 130 | 6 | 0.0750 | 0.1375 |
| … | … | … | … |
Graphical Summary (Categorical Variable)
Bar Charts
- Each category has a bar whose length is proportional to the frequency.
- Quantitative Data: Bar represents mean values.
- Qualitative Data: Bar represents frequencies.
- Bars can be vertical or horizontal.
Example:
- Breakdown causes: Electrical (9), Mechanical (24), Misuse (13).
- Bar chart shows frequencies.
Pie Charts
- A circle sliced into sectors proportional to frequencies.
- Example: Types of cancers (Lung, Breast, Colon, etc.) with relative frequencies.
Descriptive Statistics
- Steps: Collection, organization, classification, summarization, presentation.
- Measures:
- Central Tendency: Mean, median, mode.
- Variation: Variance, standard deviation, range.
- Position: Quartiles, interquartile range (IQR).
Measures of Central Tendency
-
Mean:
- Population: (\mu = \frac{\sum_{i=1}^N x_i}{N})
- Sample: (\bar{x} = \frac{\sum_{i=1}^n x_i}{n})
- Affected by extreme values.
-
Median:
- Middle value of ordered data.
- For odd (n): (\text{Median} = x_{\frac{n+1}{2}})
- For even (n): (\text{Median} = \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2})
- Less affected by outliers.
-
Mode:
- Most frequent value.
- Can be non-unique or non-existent.
Example: Data set: 1, 3, 5, 7, 7, 8.
- Mean: 5.1667
- Median: 6
- Mode: 7
Measures of Variation
-
Variance:
- Population: (\sigma^2 = \frac{1}{N} \left( \sum x^2 - \frac{(\sum x)^2}{N} \right))
- Sample: (s^2 = \frac{1}{n-1} \left( \sum x^2 - \frac{(\sum x)^2}{n} \right))
-
Standard Deviation: Square root of variance.
-
Range: (R = \text{highest value} - \text{lowest value}).
-
Coefficient of Variation (CV):
- (CV = \frac{\sigma}{\mu} \times 100%) (Population)
- (CV = \frac{s}{\bar{x}} \times 100%) (Sample)
- Higher CV → Less consistency, more dispersion.
Example: Data set: 9, 11, 12, …, 38.
- Range: 29
- Variance: 87.4333
- Standard Deviation: 9.3506
Measures of Position
- Quartiles:
- (Q_1): 25th percentile.
- (Q_2): 50th percentile (Median).
- (Q_3): 75th percentile.
- IQR: (Q_3 - Q_1).
- Quartile Deviation: (\frac{Q_3 - Q_1}{2}).
Example: Data set: 5, 6, 12, 13, 15, 18, 22, 28.
- (Q_1 = 7.5), (Q_2 = 14), (Q_3 = 21).
- IQR: 13.5
- Quartile Deviation: 6.75
Graphical Summary (Numerical Variables)
Stem-and-Leaf Plot
- Splits observations into stem (leading digits) and leaf (remaining digits).
- Example: Percent of state population born outside the U.S.
Histogram
- Divides data into class intervals and counts observations in each.
- Example: Percent of foreign-born residents in states (class intervals of 5).
Boxplot
- Displays five-number summary (Min, (Q_1), Median, (Q_3), Max).
- Outliers: Values beyond (1.5 \times \text{IQR}) from (Q_1) or (Q_3).
Example: Lifetimes of high-voltage components (boxplot shows distribution and outliers).
Outliers
- Definition: Data points not consistent with the bulk of the data.
- Reasons:
- Measurement errors.
- Belonging to a different group.
- Natural variability.
- Influence: Can affect mean and other statistics.
Example: Marks of students (70, 72, 74, 76, 78 vs. 35, 72, 74, 76, 78).
Summary
Two Approaches for Numerical Variables
| Aspect | Approach 1 | Approach 2 |
|---|---|---|
| Location | Median | Mean |
| Spread | Range or IQR | Standard Deviation |
| Summary | Five-number Summary | |
| Visualization | Boxplot | Histogram |