Word Break

1 day ago
8 min read

Stop Thinking in Strings, Start Thinking in States

Word Break looks like a string problem, but it's really about modeling reachability over prefixes and deciding whether earlier choices unlock later ones. Interviewers love it because it exposes whether you can translate a vague problem statement into a clean DP state definition. The algorithm itself isn't complicated. The hard part is seeing the problem the right way.

Problem Statement

You are given a string s and a list of strings wordDict.

Return true if s can be segmented into a space-separated sequence of one or more dictionary words. Return false otherwise.

Each word in the dictionary may be reused multiple times.

Example:

s = "leetcode", wordDict = ["leet", "code"] → true ("leet" + "code")
s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"] → false

1. Clarify Requirements Before Jumping Into Code

Before touching code, let's make sure we understand the rules.

Input: A non-empty string s and a list of non-empty strings wordDict.

Output: Boolean — can s be fully segmented using dictionary words?

Key rules to confirm:

Does every character in s need to be used? Yes, partial matches don't count.
Can words be reused? Yes.
Can wordDict be large? Often yes, so lookup speed will matter.

Edge questions worth raising out loud:

What if s is a single character? Good to confirm that we need to handle this usecase
What if wordDict contains duplicates? Doesn't matter, we'll use a set.
What if no word in the dictionary matches any prefix of s? Return false immediately.

2. Identify the Category of the Problem

At first glance this feels like a string matching problem. But the recursive structure, where matching a prefix opens up new possibilities for the suffix, is a signal: this is DP.

More precisely, it's a reachability DP question. We're not minimizing a cost, maximizing a value, or counting paths. We're asking a boolean question: is this position reachable? That puts it in a specific subclass of DP problems where the state is true or false, and transitions propagate reachability forward.

Once you recognize that framing, the algorithm becomes almost mechanical to derive.

3. Start With Brute Force to Build Intuition

The most natural first instinct is recursion. At each position in the string, try every possible prefix. If the prefix is in the dictionary, recurse on the remaining suffix.

boolean canBreak(String s, Set<String> wordSet) {
    if (s.isEmpty()) return true; // used the whole string, success

    for (int end = 1; end <= s.length(); end++) {
        String prefix = s.substring(0, end);
        if (wordSet.contains(prefix) 
                && canBreak(s.substring(end), wordSet)) {
            return true;
        }
    }

    return false;
}

Let's trace it on "leetcode" with ["leet", "code"] to confirm correctness:

canBreak("leetcode")
  → try "l" — not in dict
  → try "le" — not in dict
  → try "lee" — not in dict
  → try "leet" — in dict! recurse on "code"
      → try "c" — not in dict
      → try "co" — not in dict
      → try "cod" — not in dict
      → try "code" — in dict! recurse on ""
          → empty string → return true ✓

This works... But now think about "aaaaaaaaaaaaaaab" with ["a", "aa", "aaa"]. The recursion explores every possible way to partition the as before ultimately failing on the b. Many of those partitions share sub-problems. "Can we break the last 5 characters?" gets computed dozens of times from different starting points.

Time complexity: Exponential, O(2^n) in the worst case.

Space complexity: O(n) recursion depth.

But we have a key observation: there are overlapping subproblems. We're recomputing the same suffixes repeatedly. That's exactly when DP pays off.

4. Brainstorm More Solutions

Step 1: Memoization - Cache the Recursive Solution

The quickest fix to a recursive solution with overlapping subproblems is to cache results. If we've already determined whether s.substring(i) can be broken, store it.

boolean canBreak(String s, int start, Set<String> wordSet, Boolean[] memo) {
    if (start == s.length()) return true;
    if (memo[start] != null) return memo[start];

    for (int end = start + 1; end <= s.length(); end++) {
        if (wordSet.contains(s.substring(start, end)) &&
            canBreak(s, end, wordSet, memo)) {
            return memo[start] = true;
        }
    }
    return memo[start] = false;
}

Now each starting index is computed exactly once.

Time drops to O(n²) because there are O(n) states and each does O(n) work checking substrings.

Space is O(n) for the memo array.

This is correct and more efficient than recursion. But let's think about whether we can understand the problem even more clearly.

Step 2: Reframe the problem - "How did I get here?"

Let's step back from the code for a moment and think about what's actually making this problem hard.

The brute-force recursion fails because it keeps recomputing the same suffixes. We memoize to fix that. But memoization is still tied to the same framing: "Can I break the string starting from position i?" We're thinking in terms of suffixes, i.e. what's left to consume.

What if we flipped the perspective?

Instead of looking forward from a position and asking "can I finish from here?", let's try looking backward at a position and asking ourselves: "How did I get here?"

Pick any position i in the string and ask: What must have been true for us to have found a valid word boundary at position i? There must have been some earlier position j where:

We already had a valid segmentation of everything before j, and
The chunk from j to i is a single dictionary word.

That's it. We don't need to know how we got to j. We don't need to know which words were used. We just need to know: is position j a valid place to be?

This is the reframe. Instead of asking "Can I break the string?", ask: "Which positions in the string are reachable?"

A position is reachable if there's some valid way to segment everything before it.

Position 0 is always reachable, since we haven't consumed anything yet and haven't made any mistakes.
Position n (past the end of the string) is what we're trying to reach.

A valid segmentation is just a path of reachable positions from 0 to n.

And finding paths between reachable positions is a problem we've seen before - for example, Unique Paths and Unique Paths II. It's a bottom-up approach that DP is well suited for.

Step 3: Define the DP State

Let dp[i] = true if the substring s[0...i] can be segmented using the dictionary.

Base case: dp[0] = true. An empty prefix is trivially valid, since we haven't consumed any characters and haven't violated any rules.

Transition: To determine dp[i], consider every possible last word. If we claim the last word in the segmentation ends at position i and starts at position j, then:

dp[j] must be true (the first j characters are valid)
s[j...i] must be in the dictionary (the last chunk is a valid word)

If both conditions hold, dp[i] = true.

dp[i] = true  if there exists some j < i such that:
              dp[j] == true  AND  s[j:i] ∈ wordDict

Note that we want to ensure our wordDict is stored as a HashSet so we get O(1) average lookup times.

Now, let's trace this on "leetcode" with ["leet", "code"]:

dp[0] = true  (base case)

dp[1]: j=0 → dp[0]=T, s[0:1]="l" ∉ dict → false
dp[2]: j=0 → dp[0]=T, s[0:2]="le" ∉ dict → false
       j=1 → dp[1]=F → skip
dp[3]: similar → false
dp[4]: j=0 → dp[0]=T, s[0:4]="leet" ∈ dict → dp[4] = TRUE ✓
dp[5]: j=0..4 checked
       j=4 → dp[4]=T, s[4:5]="c" ∉ dict → false
dp[6]: j=4 → dp[4]=T, s[4:6]="co" ∉ dict → false
dp[7]: j=4 → dp[4]=T, s[4:7]="cod" ∉ dict → false
dp[8]: j=4 → dp[4]=T, s[4:8]="code" ∈ dict → dp[8] = TRUE ✓

dp[8] = true → answer is true.

Now let's trace the failing case: "catsandog" with ["cats", "dog", "sand", "and", "cat"]:

dp[0] = true
dp[3]: s[0:3]="cat" ∈ dict → dp[3] = true
dp[4]: s[0:4]="cats" ∈ dict → dp[4] = true
dp[7]: j=3 → dp[3]=T, s[3:7]="sand" ∈ dict → dp[7] = true
       j=4 → dp[4]=T, s[4:7]="and" ∈ dict → dp[7] = true (already set)
dp[9] (the end):
       j=7 → dp[7]=T, s[7:9]="og" ∉ dict → false
       All other j values either have dp[j]=false or the substring isn't in dict
       → dp[9] = false

Answer: false.

Is this better in terms of time and space complexity? Not really. Time Complexity is still O(n²) because there are still O(n) states and each is ultimately doing O(n) work checking substrings. Space complexity is still O(n) for the dp array.

It's still a good idea to walk through both solutions though, as it shows off your ability to reason through a complex problem from different perspectives. After, you'd likely be free to choose whichever solution you prefer.

5. Trade-Offs at a Glance

Approach	Time	Space	Notes
Brute-force recursion	O(2^n)	O(n)	Too slow; builds intuition
Memoized recursion	O(n²)	O(n)	Correct; small recursion overhead
Bottom-up DP	O(n²)	O(n)	Clean, iterative, preferred

Both memoization and bottom-up DP have the same asymptotic complexity here. The bottom-up approach avoids recursion overhead and is often easier to reason about during an interview. The state table is concrete and traceable.

6. Pseudocode

wordSet = convert wordDict to a HashSet

dp = boolean array of size n+1, initialized to false
dp[0] = true

for i from 1 to n:
    for j from 0 to i-1:
        if dp[j] is true AND s[j:i] is in wordSet:
            dp[i] = true
            break   // no need to check other split points

return dp[n]

7. Edge Cases

Think through these before writing your final solution:

Single character: s = "a", wordDict = ["a"] - dp[1] checks s[0:1] = "a" which is in the dict. Returns true.
Word reuse: s = "aaaaaaa", wordDict = ["aaa", "aaaa"] - "aaa"+"aaaa" works. The DP handles reuse naturally because we don't track which words were used, only whether a position is reachable.
Almost matches: s = "catsandog" - traced above. The DP correctly returns false even though most of the string is matchable.
Overlapping words: wordDict = ["cat", "cats", "sand", "dog", "and"] - both "cat" and "cats" are valid prefixes. The DP explores both by checking all possible j values, so it doesn't commit to one greedily.

8. Full Solution

import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class WordBreak {

    public static boolean wordBreak(String s, List<String> wordDict) {
        Set<String> wordSet = new HashSet<>(wordDict);
        int n = s.length();
        boolean[] dp = new boolean[n + 1];

        dp[0] = true;  // empty prefix is always valid

        for (int i = 1; i <= n; i++) {
            for (int j = 0; j < i; j++) {
                // If first j chars are valid 
                // AND s[j:i] is a word, 
                // then first i chars are valid
                if (dp[j] && wordSet.contains(s.substring(j, i))) {
                    dp[i] = true;
                    break;  // found a valid split
                }
            }
        }

        return dp[n];
    }
}

9. Test Your Solution

// Basic case — answer: true ("leet" + "code")
System.out.println(wordBreak("leetcode", List.of("leet", "code")));

// Word reuse — answer: true ("apple" + "pen" + "apple")
System.out.println(wordBreak("applepenapple", List.of("apple", "pen")));

// Almost works but fails at end — answer: false
System.out.println(wordBreak("catsandog", List.of("cats", "dog", "sand", "and", "cat")));

// Single character — answer: true
System.out.println(wordBreak("a", List.of("a")));

// Reuse with overlap — answer: true ("aaa" + "aaaa" or "aaaa" + "aaa")
System.out.println(wordBreak("aaaaaaa", List.of("aaaa", "aaa")));

// Nothing matches — answer: false
System.out.println(wordBreak("abcd", List.of("ab", "cd", "a", "bc")));
// Wait — "ab"+"cd" → answer: true. Good test to trace manually.
System.out.println(wordBreak("abcde", List.of("ab", "cd", "a", "bc")));
// "ab"+"cd" leaves "e" unmatched → false

10. Key Lessons for Future Problems

Reframe string problems as index problems. The moment you stop asking "can I split this string?" and start asking "which positions are reachable?", the DP structure becomes obvious. This shift in perspective applies to many string DP problems: Palindrome Partitioning, Longest Valid Parentheses, and others.
DP often has top-down and bottom-up solutions. Thinking through the problem in both directions shows off your ability to think through complex problems from multiple perspectives.
DP states often represent reachability, not values. Not all DP is about maximizing or counting. Boolean DP, where you track whether something is possible, is its own important pattern. The state is binary, the transitions propagate true forward, and you stop as soon as a position is confirmed reachable.
Define your base case carefully. dp[0] = true might feel like a trick, but it's logically grounded: an empty prefix is trivially segmentable. This kind of "empty case is valid" base case appears constantly in string and subsequence DP.
Don't commit greedily. A greedy approach would pick the longest (or shortest) matching prefix and move on. That fails on inputs like "catsanddog" where picking "cats" works but picking "cat" also works. You need to explore all valid split points. DP explores all of them systematically.
Hash-based lookups matter inside nested loops. Converting wordDict to a HashSet before the DP is not a minor detail. It's the difference between O(n²) and O(n² × k) where k is the dictionary size.

Word Break rewards candidates who stop thinking about strings and start thinking about states. That mental shift - from "what does the string look like?" to "what index positions are reachable and why?" - is one that pays off across many interview problems.

Good Luck and Happy Coding!