Hash Table ADT
- A hash table is a table of elements with keys.
- A hash function locates the position of a key in the table.
- Search for an element can be done in Θ(1) time.
Selected Hash Table ADT Operations
insert: Insert an element into the table.
retrieve: Retrieve an element from the table.
- An operation to empty out the hash table.
Hash Functions
- Input: A key value.
- Output: An index of an array (hash table) where the object containing the key is located.
- Example:
h(k) = k % table_size
Example Using a Hash Function
- Hash function:
h(k) = k % 100
- Search for key
214:
k = 214
- Result:
14
- Object is stored at index
14 of the array.
- Search is done in Θ(1) time.
Inserting an Element
- Use the same hash function:
h(k) = k % 100.
- For key
214, the object is stored at index 14.
- Insertion is done in Θ(1) time.
Big Picture Comparison
- Linear search:
O(n) (e.g., 500,000 comparisons for 1M keys).
- Binary search:
O(lg n) (e.g., ~19 comparisons for 500,000 keys).
- Hash table:
1 comparison vs. 19 (binary) vs. 500,000 (linear).
Collisions
- Occur when two keys hash to the same index.
- Example:
h(k) = k % 100
- Keys
393 and 193 both hash to 93.
- Resolution methods: Chaining, linear probing, etc.
Collision Resolution: Chaining
- Use an array of linked lists.
- Hash function provides the index of the linked list.
- Insert at the front of the linked list.
- Java’s
HashSet and HashMap use chaining.
Example Using Chaining
- Hash function:
h(k) = k % 7
- Insert keys:
31, 9, 36, 42, 46, 20, 2, 24.
- Collision occurs for key
2 (index 2 already occupied by 9).
- Insert
2 at the front of the linked list at index 2.
Clustering in Chaining
- Some linked lists are long; others are empty.
- Worst-case search time:
O(n) (max chain length).
Open Addressing
- Store elements directly in the array (no linked lists).
- Saves memory.
- Examples: Linear probing, quadratic probing.
Collision Resolution: Linear Probing
- On collision, place the element in the next free slot.
- Example: Collision at index
5, place at 6, 7, etc.
Problem with Linear Probing
- Inserting
56 may require probing multiple slots (e.g., 16, 17, 18, 19).
Clustering in Linear Probing
- Consecutive slots may be occupied or empty.
- Worst-case search time:
O(n) (array length).
Linear Probing Improvement: Quadratic Probing
- Move
j^2 cells from the collision point, where j is the attempt number.
- Limitation: May not find an empty cell if the array is half full.
Example of Quadratic Probing
- Hash function:
f(k) = k % 10
- Insert keys:
27, 17, 37, 47, 48, 57.
17: Collision at 7, probe 7 + 1^2 = 8.
37: Collision at 7, probe 7 + 2^2 = 11 → 1.
57: Collision at 7, probe 7 + 4^2 = 23 → 3.
Chaining vs. Linear Probing
- Chaining: Extra memory for linked lists.
- Linear Probing: Fixed memory, better for caching.
- Clustering: Worse search time for linear probing with many collisions.
- Load factor: Linear probing is better if <
0.85.
- Elements are spread evenly among indexes.
- Allows Θ(1) search time for both chaining and open addressing.
- Miss in open addressing:
O(n).
- Choose a prime number table size not close to a power of
2.
- Hash function:
h(k) = k % 97.
Ideal Hash Tables
- No collisions: Use
h(k) = k (unique keys).
- Example: 300 employees with 4-digit IDs → table size
10,000 (97% empty).
- No collisions or empty slots:
- Example: 300 employees with IDs
0-299 → table size 300, h(k) = k.
Speed vs. Memory Conservation
- Speed: Large table (no collisions) → fastest.
- Memory: Minimize empty slots → most efficient.
Hash Table Design
- Decide priority: speed or memory conservation.
- Choose table size:
- Allows a good hash function.
- Balances speed and memory.
Time Complexities
insert: Θ(1) (insert at head of linked list).
retrieve: Θ(1) for uniform hashing (bounded chain length).