Group Anagrams

mosesg1123
3 days ago
4 min read

Clustering words that are anagrams of each other is a classic interview problem and mirrors real‑world tasks like building search indexes or deduplicating permutations. It forces you to think about string normalization, hashing, and efficient grouping.

Problem Statement

Given an array of strings strs, group the anagrams together. Return a list of these groups in any order. Each group is a list of strings that are anagrams of one another.

public List<List<String>> groupAnagrams(String[] strs);

1. Clarify Requirements Before Jumping Into Code

I’d first check:

Are all strings lowercase? Yes.
Can there be empty strings? Yes—empty ones should form their own group.
Do we care about the order of groups or words inside them? No.
What scale? Let’s assume up to tens of thousands of words of moderate length.

2. Identify the Category of the Question

This is a hash‑grouping problem on strings. I’m looking for a way to compute a canonical “key” for each string so that all its anagrams share that key, then bucket by key.

Common patterns:

Sorting a string’s characters
Frequency‑count vectors for small alphabets
Hash signatures (e.g. prime‑product)

3. Consider a Brute‑Force Approach to Understand the Problem

For each string s in strs, compare it to every other string t.
Check if s and t are anagrams by sorting both (O(k log k)) or counting letters (O(k)).
Merge into the same bucket if they match.

This is O(n²·k log k) or O(n²·k), which won’t scale when n is large. It’s correct but too slow.

4. Brainstorm More Solutions

Let's start by recognizing that I can split this problem into two subproblems:

How do I recognize two words as anagrams?
Once I have a recognition method, how do I group them efficiently?

Step 1 - How do I recognize two words as anagrams?

Let's start by simplifying - how would I approach this if I were only comparing a pair of strings? There are a few options:

Option 1: Brute-force

Comparing each character in string one against every character in string two. I could remove characters from the strings if they match.
But in the worst case, that's O(n2) time, which is not going to scale well.

Option 2: Sort Letters

I could sort the letters in each string.
That's O(k log k), where k is the average length of a string.

Option 3: Count Array

Take advantage of the fact that we have a limited character set - 26 letters.
I could build a 26‐element count array for each word in O(k) time.

Step 2 - How do I group anagrams efficiently?

How do I quickly identify whether two strings match, and I want to be able to do that in constant time. What are some options?

Option 1: Sort‐as‐key

If I'm sorting every string, then I could use the sorted string as the key in a hash map.
Trade‑offs: This is easy to code and grants me constant lookup time, but it costs O (k log k) per word.

Option 2: Count‐array signature

Build a 26‐element count array for each word in O(k).
Serialize the characters into a customer string, e.g., for the string "abb", we could serialize into something like #1#2#0#0#0… .
Trade‑off: True O(n·k) time, slightly more code to build and serialize counts.

Option 3: Prime‐product hash

Map a→2, b→3, ... z→101, then multiply the corresponding primes for each letter in O(k).
Trade‑off: Compact key but risk of integer overflow or hash collisions.

Optimal choice: the count‐array signature. It’s linear time, fixed extra space, no overflow risk, and easy to explain once you break the problem into “create signature, then group by signature.”

5. Discuss Trade‑Offs Between Your Solutions

Approach	Time	Space	Pros	Cons
Sort‐as‐key	O(n·k log k)	O(n·k)	Very simple	Slower for long strings
Count‐array signature	O(n·k)	O(n·k)	True linear time, no overflow	Must serialize count array
Prime‐product hash	O(n·k)	O(n)	Compact key	Overflow/collision risk

6. Write Pseudocode to Structure Your Thoughts

function groupAnagrams(strs):
  map = empty hash map (string → list of strings)
  for each s in strs:
    counts = array[26] initialized to 0
    for each c in s:
      counts[c - 'a'] += 1
    key = serialize counts with separators
    map[key].append(s)
  return all lists in map.values()

7. Consider Edge Cases

Empty input ([]) → return [].
Single empty string ([""]) → return [[""]].
All unique → each string in its own list.
All identical → one group of n.
Very long strings → counting avoids sort overhead.

8. Write Full Code Syntax (in Java)

import java.util.*;

public class Solution {
    public List<List<String>> groupAnagrams(String[] strs) {
        Map<String, List<String>> map = new HashMap<>();
        for (String s : strs) {
            int[] counts = new int[26];
            for (char c : s.toCharArray()) {
                counts[c - 'a']++;
            }

			// you can break this into it's own method!
            StringBuilder keyBuilder = new StringBuilder();
            for (int freq : counts) {
                keyBuilder.append('#').append(freq);
            }
            String key = keyBuilder.toString();

			// familiarize yourself with the more advanced features of your preferred coding language to really stand-out
            map.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
        }

        return new ArrayList<>(map.values());
    }
}

9. Test Your Code

public static void main(String[] args) {
    Solution sol = new Solution();

    // Test 1: example
    String[] in1 = {"eat","tea","tan","ate","nat","bat"};
    List<List<String>> out1 = sol.groupAnagrams(in1);
    assert out1.size() == 3;

    // Test 2: empty input
    assert sol.groupAnagrams(new String[]{}).isEmpty();

    // Test 3: single empty string
    List<List<String>> out3 = sol.groupAnagrams(new String[]{""});
    assert out3.size() == 1 && out3.get(0).equals(Arrays.asList(""));

    // Test 4: no anagrams
    String[] in4 = {"abc","def","ghi"};
    assert sol.groupAnagrams(in4).size() == 3;

    // Test 5: duplicates
    String[] in5 = {"a","a","a"};
    List<List<String>> out5 = sol.groupAnagrams(in5);
    assert out5.size() == 1 && out5.get(0).size() == 3;

    System.out.println("All tests passed!");
}

10. Key Lessons to Remember for Future Questions

Divide and conquer: break a big problem into small subproblems (signature + grouping).
Hash‑grouping by a canonical key is a powerful pattern.
Count‑array signatures yield true linear time on fixed alphabets.
Always discuss sorting vs. counting trade‑offs to show depth.
Writing clear pseudocode prevents errors in key construction.
Familiarize yourself with the more advanced features of your preferred coding language to really stand-out

Good Luck and Happy Coding!