Introduction

What is Statistics?

  • Statistics: The science of conducting studies to collect, organize, summarize, analyze, present, interpret, and draw conclusions from data.
  • Data: Any value, either observation or measurement, that has been collected.
  • Variable: A characteristic or attribute that can assume different values.
    • Variables whose values are determined by chance are called random variables.

Some Terminologies

  • Individuals/Units/Instances/Records: The objects described by a set of data.
  • Variable/Attribute/Feature:
    • Any characteristic of an individual.
    • Can take different values for different individuals.
    • Any random unit will have a random value (random variable).

Why Do We Need Statistics?

  • Describing the Relationship Between Variables
    • Example: A university admission director studies the relationship between SPM results and GPA.
  • Making Better Decisions in the Face of Uncertainty
    • Example: A consumer activist uses statistical inference to verify if a 90% customer satisfaction claim by a hair stylist is exaggerated.

Variables

Types of Variables

  • Numerical
    • Discrete
    • Continuous
  • Categorical
    • Nominal
    • Ordinal

Definitions

  • Numerical Variable:
    • Measurements are numerical values on a continuous scale.
    • Arithmetic operations (adding, averaging) make sense.
  • Categorical Variable:
    • Places an individual into one of several groups or discrete categories.
    • Ordinal: An order is evident in the categories.

Level of Measurement Data

Qualitative (Categorical/Attribute) Data

  • Data classified using code numbers.
  • Nominal Data: No ranking. (e.g., Gender, race, nationality)
  • Ordinal Data: Can be ranked. (e.g., Likert scale, color intensity)

Quantitative (Numerical) Data

  • Can be counted or measured.
  • Discrete Data: Finite, countable values. (e.g., Number of students, number of defects)
  • Continuous Data: Measured within two values, rounded to decimals. (e.g., Weight, age, salary)

Levels of Measurement

LevelDescriptionExamples
NominalCategories, no rankingGender, Religion, Zip Code
OrdinalCategories with rankingGrades (A, B, C), Ratings (Good, Excellent)
IntervalRanked with equal intervals, no true zeroIQ test, Temperature, Shoe Size
RatioTrue zero existsHeight, Weight, Time, Salary

Exercise 1

Analyze job-related injuries data by categorizing variables:

  1. Qualitative or Quantitative?
  2. Discrete or Continuous?
  3. Nominal or Ordinal?
  4. Measurement Level?

Dimensionality of Dataset

DimensionVariablesPurposeExampleCommon Techniques
Univariate1Analyze distribution, central tendency, dispersionHeight of studentsMean, Median, Boxplot
Bivariate2Relationship & associationHeight vs. WeightCorrelation, Scatter Plot
Multivariate>2Complex relationshipsHeight, Weight, AgeRegression, Clustering

Statistical Data Analysis

  1. Asking the right question(s).
  2. Collecting useful data or searching for secondary data.
  3. Exploring and analyzing data to answer questions.
  4. Making decisions & inferences about a population.
  5. Turning data into knowledge.

Population and Sample

  • Population (N): A complete collection of measurements, outcomes, or individuals under study.
  • Sample (n): A subset of the population.

Types of Populations

  • Tangible: Finite, fixed subjects (e.g., all students in a university).
  • Intangible (Conceptual): Unlimited possible observations (e.g., simulated data).

Parameter vs. Statistic

  • Parameter: A numerical value representing a population characteristic.
    • Example: The average height of all students in a university.
  • Statistic: A numerical value representing a sample characteristic.
    • Example: The average height of female students in the university.

Notation

MeasurementParameter (Population)Statistic (Sample)
Mean𝜇
Variance𝜎²
Standard Deviation𝜎s
Proportionπp

Example 1

A travel agent claims that large hotels in Pahang have an average of 500 rooms (σ = 165). A sample of 7 hotels in Genting Highlands shows an average of 435 rooms (s = 15).

  • Population: All large hotels in Pahang.
  • Sample: 7 large hotels in Genting Highlands.
  • Variable: Number of rooms.
  • Parameter: 𝜇 = 500, 𝜎 = 165.
  • Statistic: x̄ = 435, s = 15.

Exercise 2

A hostel has 317 first-year students. A dean collects IQ pre-test scores for 27 students and estimates the mean IQ for all students. Answer:

  1. What is the population?
  2. Is it tangible or conceptual?
  3. What is the sample?
  4. What is the variable?
  5. Which number describes a parameter?
  6. Which number describes a statistic?

Descriptive vs. Inferential Statistics

  • Descriptive Statistics: Organizing, summarizing, and presenting data.
    • Example: “10,000 parents in Malaysia chose Takaful Insurance.”
  • Inferential Statistics: Generalizing and making predictions based on a sample.
    • Example: “Lung cancer rate is 10x higher in smokers.”

Exercise 3

Classify these as Descriptive or Inferential Statistics:

  1. The average cost of a wedding is RM10,000.
  2. Median salary for a bachelor’s degree holder in Malaysia is RM30,000 (men), RM29,000 (women).
  3. Estimated 500,000 children under 15 have Type 1 diabetes.
  4. A new drug is claimed to reduce heart attacks in men over 70.