top of page

Group Anagrams

  • mosesg1123
  • 3 days ago
  • 4 min read

Clustering words that are anagrams of each other is a classic interview problem and mirrors real‑world tasks like building search indexes or deduplicating permutations. It forces you to think about string normalization, hashing, and efficient grouping.


Problem Statement

Given an array of strings strs, group the anagrams together. Return a list of these groups in any order. Each group is a list of strings that are anagrams of one another.

public List<List<String>> groupAnagrams(String[] strs);

1. Clarify Requirements Before Jumping Into Code

I’d first check:

  • Are all strings lowercase? Yes.

  • Can there be empty strings? Yes—empty ones should form their own group.

  • Do we care about the order of groups or words inside them? No.

  • What scale? Let’s assume up to tens of thousands of words of moderate length.


2. Identify the Category of the Question

This is a hash‑grouping problem on strings. I’m looking for a way to compute a canonical “key” for each string so that all its anagrams share that key, then bucket by key.

Common patterns:

  • Sorting a string’s characters

  • Frequency‑count vectors for small alphabets

  • Hash signatures (e.g. prime‑product)


3. Consider a Brute‑Force Approach to Understand the Problem


  1. For each string s in strs, compare it to every other string t.

  2. Check if s and t are anagrams by sorting both (O(k log k)) or counting letters (O(k)).

  3. Merge into the same bucket if they match.

This is O(n²·k log k) or O(n²·k), which won’t scale when n is large. It’s correct but too slow.


4. Brainstorm More Solutions

Let's start by recognizing that I can split this problem into two subproblems:

  1. How do I recognize two words as anagrams?

  2. Once I have a recognition method, how do I group them efficiently?


Step 1 - How do I recognize two words as anagrams?

Let's start by simplifying - how would I approach this if I were only comparing a pair of strings? There are a few options:

Option 1: Brute-force

  • Comparing each character in string one against every character in string two. I could remove characters from the strings if they match.

  • But in the worst case, that's O(n2) time, which is not going to scale well.

Option 2: Sort Letters

  • I could sort the letters in each string.

  • That's O(k log k), where k is the average length of a string.

Option 3: Count Array

  • Take advantage of the fact that we have a limited character set - 26 letters.

  • I could build a 26‐element count array for each word in O(k) time.


Step 2 - How do I group anagrams efficiently?

How do I quickly identify whether two strings match, and I want to be able to do that in constant time. What are some options?

Option 1: Sort‐as‐key

  • If I'm sorting every string, then I could use the sorted string as the key in a hash map.

  • Trade‑offs: This is easy to code and grants me constant lookup time, but it costs O (k log k) per word.

Option 2: Count‐array signature

  • Build a 26‐element count array for each word in O(k).

  • Serialize the characters into a customer string, e.g., for the string "abb", we could serialize into something like #1#2#0#0#0… .

  • Trade‑off: True O(n·k) time, slightly more code to build and serialize counts.

Option 3: Prime‐product hash

  • Map a→2, b→3, ... z→101, then multiply the corresponding primes for each letter in O(k).

  • Trade‑off: Compact key but risk of integer overflow or hash collisions.


Optimal choice: the count‐array signature. It’s linear time, fixed extra space, no overflow risk, and easy to explain once you break the problem into “create signature, then group by signature.”


5. Discuss Trade‑Offs Between Your Solutions

Approach

Time

Space

Pros

Cons

Sort‐as‐key

O(n·k log k)

O(n·k)

Very simple

Slower for long strings

Count‐array signature

O(n·k)

O(n·k)

True linear time, no overflow

Must serialize count array

Prime‐product hash

O(n·k)

O(n)

Compact key

Overflow/collision risk


6. Write Pseudocode to Structure Your Thoughts

function groupAnagrams(strs):
  map = empty hash map (string → list of strings)
  for each s in strs:
    counts = array[26] initialized to 0
    for each c in s:
      counts[c - 'a'] += 1
    key = serialize counts with separators
    map[key].append(s)
  return all lists in map.values()

7. Consider Edge Cases

  1. Empty input ([]) → return [].

  2. Single empty string ([""]) → return [[""]].

  3. All unique → each string in its own list.

  4. All identical → one group of n.

  5. Very long strings → counting avoids sort overhead.


8. Write Full Code Syntax (in Java)

import java.util.*;

public class Solution {
    public List<List<String>> groupAnagrams(String[] strs) {
        Map<String, List<String>> map = new HashMap<>();
        for (String s : strs) {
            int[] counts = new int[26];
            for (char c : s.toCharArray()) {
                counts[c - 'a']++;
            }

			// you can break this into it's own method!
            StringBuilder keyBuilder = new StringBuilder();
            for (int freq : counts) {
                keyBuilder.append('#').append(freq);
            }
            String key = keyBuilder.toString();

			// familiarize yourself with the more advanced features of your preferred coding language to really stand-out
            map.computeIfAbsent(key, k -> new ArrayList<>()).add(s);
        }

        return new ArrayList<>(map.values());
    }
}

9. Test Your Code

public static void main(String[] args) {
    Solution sol = new Solution();

    // Test 1: example
    String[] in1 = {"eat","tea","tan","ate","nat","bat"};
    List<List<String>> out1 = sol.groupAnagrams(in1);
    assert out1.size() == 3;

    // Test 2: empty input
    assert sol.groupAnagrams(new String[]{}).isEmpty();

    // Test 3: single empty string
    List<List<String>> out3 = sol.groupAnagrams(new String[]{""});
    assert out3.size() == 1 && out3.get(0).equals(Arrays.asList(""));

    // Test 4: no anagrams
    String[] in4 = {"abc","def","ghi"};
    assert sol.groupAnagrams(in4).size() == 3;

    // Test 5: duplicates
    String[] in5 = {"a","a","a"};
    List<List<String>> out5 = sol.groupAnagrams(in5);
    assert out5.size() == 1 && out5.get(0).size() == 3;

    System.out.println("All tests passed!");
}

10. Key Lessons to Remember for Future Questions

  • Divide and conquer: break a big problem into small subproblems (signature + grouping).

  • Hash‑grouping by a canonical key is a powerful pattern.

  • Count‑array signatures yield true linear time on fixed alphabets.

  • Always discuss sorting vs. counting trade‑offs to show depth.

  • Writing clear pseudocode prevents errors in key construction.

  • Familiarize yourself with the more advanced features of your preferred coding language to really stand-out


Good Luck and Happy Coding!

Recent Posts

See All
Contains Duplicate Within K Distance

Checking for nearby repeated events in a log or sensor stream is a common task in monitoring, fraud detection, and real‑time analytics....

 
 
 
Longest Consecutive Sequence

Finding the longest run of consecutive days, IDs, or timestamps in a dataset comes up in analytics, event processing, and scheduling systems. This “Longest Consecutive Sequence” problem is a great way

 
 
 
Subarray Sum Equals K

Finding how many contiguous stretches of transactions sum to a target turns up in budgeting tools, analytics dashboards, and real‑time...

 
 
 

Comments


Drop Me a Line, Let Me Know What You Think

Thanks for submitting!

© 2023 by Train of Thoughts. Proudly created with Wix.com

bottom of page