Group Anagrams
- mosesg1123
- 3 days ago
- 4 min read
Clustering words that are anagrams of each other is a classic interview problem and mirrors real‑world tasks like building search indexes or deduplicating permutations. It forces you to think about string normalization, hashing, and efficient grouping.
Problem Statement
Given an array of strings strs, group the anagrams together. Return a list of these groups in any order. Each group is a list of strings that are anagrams of one another.
public List<List<String>> groupAnagrams(String[] strs);
1. Clarify Requirements Before Jumping Into Code
I’d first check:
Are all strings lowercase? Yes.
Can there be empty strings? Yes—empty ones should form their own group.
Do we care about the order of groups or words inside them? No.
What scale? Let’s assume up to tens of thousands of words of moderate length.
2. Identify the Category of the Question
This is a hash‑grouping problem on strings. I’m looking for a way to compute a canonical “key” for each string so that all its anagrams share that key, then bucket by key.
Common patterns:
Sorting a string’s characters
Frequency‑count vectors for small alphabets
Hash signatures (e.g. prime‑product)
3. Consider a Brute‑Force Approach to Understand the Problem
For each string s in strs, compare it to every other string t.
Check if s and t are anagrams by sorting both (O(k log k)) or counting letters (O(k)).
Merge into the same bucket if they match.
This is O(n²·k log k) or O(n²·k), which won’t scale when n is large. It’s correct but too slow.
4. Brainstorm More Solutions
Let's start by recognizing that I can split this problem into two subproblems:
How do I recognize two words as anagrams?
Once I have a recognition method, how do I group them efficiently?
Step 1 - How do I recognize two words as anagrams?
Let's start by simplifying - how would I approach this if I were only comparing a pair of strings? There are a few options:
Option 1: Brute-force
Comparing each character in string one against every character in string two. I could remove characters from the strings if they match.
But in the worst case, that's O(n2) time, which is not going to scale well.
Option 2: Sort Letters
I could sort the letters in each string.
That's O(k log k), where k is the average length of a string.
Option 3: Count Array
Take advantage of the fact that we have a limited character set - 26 letters.
I could build a 26‐element count array for each word in O(k) time.
Step 2 - How do I group anagrams efficiently?
How do I quickly identify whether two strings match, and I want to be able to do that in constant time. What are some options?
Option 1: Sort‐as‐key
If I'm sorting every string, then I could use the sorted string as the key in a hash map.
Trade‑offs: This is easy to code and grants me constant lookup time, but it costs O (k log k) per word.
Option 2: Count‐array signature
Build a 26‐element count array for each word in O(k).
Serialize the characters into a customer string, e.g., for the string "abb", we could serialize into something like #1#2#0#0#0… .
Trade‑off: True O(n·k) time, slightly more code to build and serialize counts.
Option 3: Prime‐product hash
Map a→2, b→3, ... z→101, then multiply the corresponding primes for each letter in O(k).
Trade‑off: Compact key but risk of integer overflow or hash collisions.
Optimal choice: the count‐array signature. It’s linear time, fixed extra space, no overflow risk, and easy to explain once you break the problem into “create signature, then group by signature.”
5. Discuss Trade‑Offs Between Your Solutions
Approach | Time | Space | Pros | Cons |
Sort‐as‐key | O(n·k log k) | O(n·k) | Very simple | Slower for long strings |
Count‐array signature | O(n·k) | O(n·k) | True linear time, no overflow | Must serialize count array |
Prime‐product hash | O(n·k) | O(n) | Compact key | Overflow/collision risk |
6. Write Pseudocode to Structure Your Thoughts
function groupAnagrams(strs):
map = empty hash map (string → list of strings)
for each s in strs:
counts = array[26] initialized to 0
for each c in s:
counts[c - 'a'] += 1
key = serialize counts with separators
map[key].append(s)
return all lists in map.values()
7. Consider Edge Cases
Empty input ([]) → return [].
Single empty string ([""]) → return [[""]].
All unique → each string in its own list.
All identical → one group of n.
Very long strings → counting avoids sort overhead.
8. Write Full Code Syntax (in Java)
import java.util.*;
public class Solution {
public List<List<String>> groupAnagrams(String[] strs) {
Map<String, List<String>> map = new HashMap<>();
for (String s : strs) {
int[] counts = new int[26];
for (char c : s.toCharArray()) {
counts[c - 'a']++;
}
// you can break this into it's own method!
StringBuilder keyBuilder = new StringBuilder();
for (int freq : counts) {
keyBuilder.append('#').append(freq);
}
String key = keyBuilder.toString();
// familiarize yourself with the more advanced features of your preferred coding language to really stand-out
map.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
}
return new ArrayList<>(map.values());
}
}
9. Test Your Code
public static void main(String[] args) {
Solution sol = new Solution();
// Test 1: example
String[] in1 = {"eat","tea","tan","ate","nat","bat"};
List<List<String>> out1 = sol.groupAnagrams(in1);
assert out1.size() == 3;
// Test 2: empty input
assert sol.groupAnagrams(new String[]{}).isEmpty();
// Test 3: single empty string
List<List<String>> out3 = sol.groupAnagrams(new String[]{""});
assert out3.size() == 1 && out3.get(0).equals(Arrays.asList(""));
// Test 4: no anagrams
String[] in4 = {"abc","def","ghi"};
assert sol.groupAnagrams(in4).size() == 3;
// Test 5: duplicates
String[] in5 = {"a","a","a"};
List<List<String>> out5 = sol.groupAnagrams(in5);
assert out5.size() == 1 && out5.get(0).size() == 3;
System.out.println("All tests passed!");
}
10. Key Lessons to Remember for Future Questions
Divide and conquer: break a big problem into small subproblems (signature + grouping).
Hash‑grouping by a canonical key is a powerful pattern.
Count‑array signatures yield true linear time on fixed alphabets.
Always discuss sorting vs. counting trade‑offs to show depth.
Writing clear pseudocode prevents errors in key construction.
Familiarize yourself with the more advanced features of your preferred coding language to really stand-out
Good Luck and Happy Coding!
Comments