Hash Table ADT

  • A hash table is a table of elements with keys.
  • A hash function locates the position of a key in the table.
  • Search for an element can be done in Θ(1) time.

Selected Hash Table ADT Operations

  • insert: Insert an element into the table.
  • retrieve: Retrieve an element from the table.
  • An operation to empty out the hash table.

Hash Functions

  • Input: A key value.
  • Output: An index of an array (hash table) where the object containing the key is located.
  • Example:
    h(k) = k % table_size

Example Using a Hash Function

  • Hash function: h(k) = k % 100
  • Search for key 214:
    • k = 214
    • Result: 14
    • Object is stored at index 14 of the array.
    • Search is done in Θ(1) time.

Inserting an Element

  • Use the same hash function: h(k) = k % 100.
  • For key 214, the object is stored at index 14.
  • Insertion is done in Θ(1) time.

Big Picture Comparison

  • Linear search: O(n) (e.g., 500,000 comparisons for 1M keys).
  • Binary search: O(lg n) (e.g., ~19 comparisons for 500,000 keys).
  • Hash table: 1 comparison vs. 19 (binary) vs. 500,000 (linear).

Collisions

  • Occur when two keys hash to the same index.
  • Example:
    • h(k) = k % 100
    • Keys 393 and 193 both hash to 93.
  • Resolution methods: Chaining, linear probing, etc.

Collision Resolution: Chaining

  • Use an array of linked lists.
  • Hash function provides the index of the linked list.
  • Insert at the front of the linked list.
  • Java’s HashSet and HashMap use chaining.

Example Using Chaining

  • Hash function: h(k) = k % 7
  • Insert keys: 31, 9, 36, 42, 46, 20, 2, 24.
  • Collision occurs for key 2 (index 2 already occupied by 9).
  • Insert 2 at the front of the linked list at index 2.

Clustering in Chaining

  • Some linked lists are long; others are empty.
  • Worst-case search time: O(n) (max chain length).

Open Addressing

  • Store elements directly in the array (no linked lists).
  • Saves memory.
  • Examples: Linear probing, quadratic probing.

Collision Resolution: Linear Probing

  • On collision, place the element in the next free slot.
  • Example: Collision at index 5, place at 6, 7, etc.

Problem with Linear Probing

  • Inserting 56 may require probing multiple slots (e.g., 16, 17, 18, 19).

Clustering in Linear Probing

  • Consecutive slots may be occupied or empty.
  • Worst-case search time: O(n) (array length).

Linear Probing Improvement: Quadratic Probing

  • Move j^2 cells from the collision point, where j is the attempt number.
  • Limitation: May not find an empty cell if the array is half full.

Example of Quadratic Probing

  • Hash function: f(k) = k % 10
  • Insert keys: 27, 17, 37, 47, 48, 57.
    • 17: Collision at 7, probe 7 + 1^2 = 8.
    • 37: Collision at 7, probe 7 + 2^2 = 11 → 1.
    • 57: Collision at 7, probe 7 + 4^2 = 23 → 3.

Chaining vs. Linear Probing

  • Chaining: Extra memory for linked lists.
  • Linear Probing: Fixed memory, better for caching.
  • Clustering: Worse search time for linear probing with many collisions.
  • Load factor: Linear probing is better if < 0.85.

Uniform Hashing

  • Elements are spread evenly among indexes.
  • Allows Θ(1) search time for both chaining and open addressing.
  • Miss in open addressing: O(n).

Ideal Hash Function for Uniform Hashing

  • Choose a prime number table size not close to a power of 2.
    • Example: 97 (not 31).
  • Hash function: h(k) = k % 97.

Ideal Hash Tables

  • No collisions: Use h(k) = k (unique keys).
    • Example: 300 employees with 4-digit IDs → table size 10,000 (97% empty).
  • No collisions or empty slots:
    • Example: 300 employees with IDs 0-299 → table size 300, h(k) = k.

Speed vs. Memory Conservation

  • Speed: Large table (no collisions) → fastest.
  • Memory: Minimize empty slots → most efficient.

Hash Table Design

  • Decide priority: speed or memory conservation.
  • Choose table size:
    • Allows a good hash function.
    • Balances speed and memory.

Time Complexities

  • insert: Θ(1) (insert at head of linked list).
  • retrieve: Θ(1) for uniform hashing (bounded chain length).