Know Thy Hash Table

It's Friday the 13th, and I have decided to seriously explore data structures and algorithms starting today. I am not driven by the desire to get placed in big tech but rather, I am in pursuit of becoming a better developer who understands the internals of operating systems, databases, and everything low-level.

I have decided to tackle arrays and strings first, using Java as the programming language. This will not be basic, as I already know the fundamentals of arrays and strings. This time, I want to take things up a notch by solving more problems to deepen my understanding and improve my implementation skills of concepts like StringBuilder, HashMap, and others.

Whenever we need a data structure to store key-value pairs for extremely efficient lookups, the first thing that comes to my mind is a hash map.

"Wait a minute, Nabiel. You said you were going to start with hash tables!"

Allow me to explain. From what I've understood, Hashtable and HashMap in Java differ in the following ways:

Hashtable is synchronized, which makes it better suited for multithreaded applications.

What do you mean by "synchronized"?
By "synchronized", I mean that multiple threads can access and modify the data structure simultaneously without causing data corruption.

Hashtable does not allow null keys or null values, whereas HashMap allows one null key and multiple null values.

Now, back to why I mentioned 'HashMap' instead of 'Hashtable' :

When you first start programming in Java, the code you write is typically non-concurrent (single-threaded). Also, HashMap was introduced after Java 1.2. Since I started programming in Java 17, I’m so accustomed to using HashMap over Hashtable class that I don't always consider the performance implications or the distinct differences between the two data structures.

Implementating a Simple Hash Table

Even though there are a number of ways to implement a hash table, the simplest implementation is to use an array of linked lists and a hash code function.

Key Steps

Compute the hash code for the key:
- The key could be an int or long (both are number types in Java).
- It's important to note that different keys can have the same hash code. This happens because, while there are many possible keys, the number of hash codes is limited (we use int values for hash codes).
Map the hash code to an index in the array:
- Use a formula like hash(key) % array_length to get the index where the key-value pair should be stored.
- Two different keys could end up mapping to the same index, which is called a collision.
Store the key-value pair at that index:
- At this index, we use a linked list to store key-value pairs. This helps handle collisions, where either two keys have the same hash code or two different hash codes map to the same index.
- By using a linked list, we can add new key-value pairs to the same index without losing the previous ones.

A good implementation of a hash table that keeps the collisions to a minimum has a lookup time of O(1). In the worst-case scenario, i.e., when the number of collisions is very high, the lookup time is O(N), where N is the number of keys.

Code

class KeyValuePair {
    int key;
    String value;
    KeyValuePair next;

    KeyValuePair(int key, String value) {
        this.key = key;
        this.value = value;
    }
}

class HashTable {
    private static final int SIZE = 10;
    private KeyValuePair[] table;

    public HashTable() {
        table = new KeyValuePair[SIZE];
    }

    private int hash(int key) {
        return key % SIZE;
    }

    public void insert(int key, String value) {
        int index = hash(key);
        KeyValuePair newPair = new KeyValuePair(key, value);
        if (table[index] == null) {
            table[index] = newPair;
        } else {
            KeyValuePair current = table[index];
            while (current.next != null && current.key != key) {
                current = current.next;
            }
            if (current.key == key) {
                current.value = value; // Update value
            } else {
                current.next = newPair;
            }
        }
    }

    public String get(int key) {
        int index = hash(key);
        KeyValuePair current = table[index];
        while (current != null) {
            if (current.key == key) {
                return current.value;
            }
            current = current.next;
        }
        return null;
    }
}

Implementating a Balanced Binary Search Tree (BST) Hash Table

The alternate method is to implement a balanced BST, where the lookup time is O(log N).

Advantages of a BST Implementation

Less space usage: As we no longer allocate a large array.
Key ordering: We can iterate through the keys in order, which can be useful at times.

Code

class BSTNode {
    int key;
    String value;
    BSTNode left, right;

    BSTNode(int key, String value) {
        this.key = key;
        this.value = value;
        left = right = null;
    }
}

class BSTHashTable {
    private BSTNode root;

    public void insert(int key, String value) {
        root = insertRec(root, key, value);
    }

    private BSTNode insertRec(BSTNode node, int key, String value) {
        if (node == null) {
            return new BSTNode(key, value);
        }
        if (key < node.key) {
            node.left = insertRec(node.left, key, value);
        } else if (key > node.key) {
            node.right = insertRec(node.right, key, value);
        } else {
            node.value = value; // Update value
        }
        return node;
    }

    public String get(int key) {
        return getRec(root, key);
    }

    private String getRec(BSTNode node, int key) {
        if (node == null) {
            return null;
        }
        if (key < node.key) {
            return getRec(node.left, key);
        } else if (key > node.key) {
            return getRec(node.right, key);
        } else {
            return node.value;
        }
    }
}

Okay, so far we implemented hash tables from scratch. Now, it is high time that we had solved some problems related to hash tables and hashing.

In Java, solving problems related to hash tables and hashing becomes significantly easier thanks to the rich set of utilities provided by the Java Collections Framework. This framework offers efficient and robust implementations like HashMap and HashSet, which are based on hash tables and handle key operations like insertions, deletions, and lookups in constant average time complexity, O(1).

These classes also automatically manage hash collisions and resizing for optimal performance. For scenarios where ordering is important, LinkedHashMap and LinkedHashSet are useful extensions that maintain the order of elements as they are inserted. When working with thread-safe applications, ConcurrentHashMap ensures efficient, thread-safe access to hash table operations. Java also provides TreeMap, a Red-Black tree-backed implementation, for sorted key-value pairs, which is invaluable when a balanced BST structure is required.

Problems

1. Contains Duplicate

Imagine you are given a list of numbers, and your task is to find out if any number appears more than once in the list. For example:

If the list is [1, 2, 3, 4], there are no duplicates, so the answer is false.
If the list is [1, 2, 3, 2], the number 2 appears twice, so the answer is true.

To solve this problem, we use a HashSet, which is like a magical bag that only holds unique items. If you try to add something that is already in the bag, it lets you know.

Here's how the solution works step by step:

Create an empty HashSet: Think of it as an empty bag to store the numbers we have seen so far.
Go through the list of numbers one by one:
- If the current number is already in the bag, it means we found a duplicate, so we return true.
- If the number is not in the bag, we add it to the bag and keep checking the rest of the list.
If we finish checking all the numbers and don’t find any duplicates, we return false.

Why Use a HashSet?

A HashSet is perfect for this problem because it is designed to store unique items only.
Checking if something is in the set (contains()) and adding a new item (add()) are both very fast.

Real-World Analogy

Imagine you’re keeping track of names at a party. As each guest arrives, you check your list (the HashSet). If their name is already on the list, you know they’ve been here before (duplicate). If not, you add their name to the list and continue checking the next guest.

    public boolean containsDuplicate(int[] nums) {
        HashSet<Integer> seen = new HashSet();
        for (int i = 0; i < nums.length; i++) {
            if (seen.contains(nums[i]))
                return true;
            else
                seen.add(nums[i]);
        }

        return false;
    }

References

I'll be back with dynamic arrays in the next one. Stay tuned!