Wildcard Matching: A Dynamic Programming Interview Walkthrough
- 6 hours ago
- 13 min read
Wildcard Matching is a dynamic programming problem interviewers reach for when they want to see whether you can interpret pattern semantics carefully and resist the urge to write greedy code. It looks deceptively similar to Regular Expression Matching — same * symbol, same setup — but the semantics are different in a critical way: here, * matches any sequence of characters, not "zero or more of the preceding character." That difference changes the DP transitions completely. The actual skill being tested is the discipline to model the operator's semantics precisely before reaching for a familiar template.
The same matching semantics power glob patterns in Unix shells (*.txt, image-???.png), file-search dialogs in operating systems, allow/deny rules in firewalls and CDN routing tables, configuration management tools that match resource names, and any system that needs flexible pattern matching without the complexity of full regex.
Problem Statement
Given an input string s and a pattern p, implement wildcard matching with support for:
? which matches any single character
* which matches any sequence of characters, including the empty sequence
The matching should cover the entire input string, not partial matches.
Return true if the string matches the pattern, false otherwise.
Example:
s = "adceb", p = "*a*b"
answer = true (the first * matches the empty string, a matches a, the second * matches dce, and b matches b).
1. Clarify Requirements Before Jumping Into Code
Before writing a single line, we should restate the problem in our own words and ask the kinds of questions that would change our approach.
Input: a string s and a pattern p.
Output: true if the pattern matches the entire string, false otherwise.
Details to clarify:
Does * apply to a preceding character (like in regex) or stand alone? Stand alone — it matches any sequence of characters by itself.
Can * match the empty sequence? Yes — this is what makes patterns like *a*b work when there's nothing before the a.
Can the pattern have multiple * in a row (like ** or ***)? Yes, but they collapse semantically to a single *. The algorithm should handle this correctly without choking.
Is partial matching allowed? No — the pattern must consume the entire string.
Any other special characters? Only ? and *.
The thing to flag is that * is a flexible absorber. It can match zero characters, or 1, or 100. The key insight, which we'll build up to in the brainstorm, is that this flexibility can be expressed with just two choices at each step rather than enumerating every possible length.
2. Identify the Category of the Question
A few signals jump out:
We're checking compatibility between two sequences.
The match decision at each position depends on a small local context.
The pattern has an operator (*) that can match a variable amount of input.
Subproblems overlap: the question "does prefix of s match prefix of p?" repeats across many recursive branches.
This combination of factors — two strings, local decisions, variable-length operator, overlapping subproblems — is the fingerprint of 2D string DP. This is the same family as Edit Distance, Longest Common Subsequence, and Regular Expression Matching. The wrinkle that distinguishes Wildcard Matching from Regex Matching is that * here is unary on the input, not bound to a preceding character — which actually makes the DP simpler than regex, once we see why.
3. Brute Force Solution
As always, we start with the brute force. The most direct approach is recursion with backtracking. At each pair of positions (i, j):
If the current pattern character is a regular letter or ?, try matching it against the current string character and advance both pointers.
If the current pattern character is *, try every possible length the * could consume — zero characters, one character, two characters, and so on — and recurse on each.
match(i, j):
if j == len(p): return i == len(s)
if i == len(s): return all remaining pattern chars are '*'
if p[j] == '?' or p[j] == s[i]: return match(i+1, j+1)
if p[j] == '*':
for k from 0 to len(s) - i:
if match(i + k, j + 1): return true
return false
return falseThis is correct but exponential in the worst case — every * branches in O(n) directions, and many of those branches re-explore the same (i, j) pairs. Without memoization, patterns with multiple *'s cause the runtime to explode.
The brute force does teach us one thing worth keeping: the matching decision at any point depends only on the indices i and j. There's no hidden state about "which characters the * has consumed so far" — once we've decided where this * ends, the rest of the matching only cares about the new (i, j). That's the property that makes DP work.
4. Brainstorm More Solutions
Step 1: How would we match "adceb" against "*a*b" by hand?
Let's slow down and walk through it.
Pattern is *a*b, string is adceb.
First pattern character is *. That can match any prefix of the string — empty, a, ad, adc, adce, or adceb. I don't know which yet, so let me push forward and see which choice eventually works.
Suppose * matches the empty string. Then I'm at s = "adceb", p = "a*b". First pattern character is a, which matches s[0]. Advance both. Now s = "dceb", p = "*b".
Now * again. We have the same problem: how much should it consume? It needs to consume just enough that the remaining b matches the last character of s. The trailing b in the string is at the end, so * should consume dce. After that, s = "b", p = "b", match succeeds.
There are two things to notice. First, every time we hit *, we had a length choice — how many characters to absorb. Second, that length choice could only really be made by looking ahead at the rest of the pattern. There's no greedy rule that works.
So the algorithm somehow has to encode our choices. Either we backtrack (the brute force), or we encode all the possible choices into a DP table.
Step 2: Define the state
Let dp[i][j] be true if the first i characters of s match the first j characters of p, and false otherwise. The answer we want is dp[m][n], where m = s.length() and n = p.length().
Why does this state work as a self-contained subproblem? Because matching only depends on positions, not on the history of how we got there. Two different ways of arriving at (i, j) — say, two different choices of how many characters an earlier * consumed — give the same answer from there on. That's the property DP needs.
The base case is dp[0][0] = true — an empty string matches an empty pattern.
Step 3: Find the right anchor for the transition
The state is defined — now we need the transition. The brute force gave us a hint: every match decision is local. We look at the current character in s and the current character in p and decide what to do. That locality is good news. It means the answer to dp[i][j] should depend on what's happening at the end of the prefixes, not on some pattern spread across the whole string.
But "the end" has two sides — the end of s and the end of p. Which side do we anchor on?
If we anchor on s[i-1], we learn what character needs to be matched, but we don't know how the pattern is trying to match it. The pattern could be trying to use a literal character, a ?, or a *. The string character alone doesn't tell us which rule is active.
If we anchor on p[j-1], we learn which rule is being applied. The pattern character determines the type of transition.
This is the same asymmetry we saw in Regular Expression Matching — the pattern dictates the kind of transition, the string is the input being checked against it. So we anchor on p[j-1] and split into cases based on what it is.
What are the possible values of p[j-1]?
A regular letter.
A ? (matches any single character).
A * (matches any sequence, including empty).
Regular letters and ? behave the same way for our purposes — both want to consume exactly one character of s. So they collapse into one case. That leaves us with two cases: p[j-1] is *, or p[j-1] is not *. The non-* case is easier, so let's start there.
Step 4: Handle the non-* case
If p[j-1] is a regular letter or ?, it has to match exactly one character of s. So s[i-1] (the last character of the string prefix) needs to match it, and the rest of the string prefix has to match the rest of the pattern prefix.
if p[j-1] == s[i-1] or p[j-1] == '?':
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = falseClean and straightforward. This is the same shape of transition we've seen in Edit Distance and Regex Matching's literal-character case.
Step 5: Handle the * case
This is where the problem becomes interesting. The brute force tried every possible number of characters that * could consume — zero, one, two, etc. — and recursed on each. That gave exponential branching. The question now: can we capture all those choices in just two DP transitions?
Let's enumerate our options. At position (i, j) with p[j-1] == '*', there are exactly two "moves" we can make, and every possible length choice eventually reduces to repeating one of them:
The * matches the empty sequence (zero characters). We're saying "this * consumes nothing." We skip past it in the pattern and ask whether s[0..i) matches the rest of the pattern, p[0..j-1).
dp[i][j] = dp[i][j-1]
The * matches one more character. We're saying "this * is alive and consumes at least one more character of s." We advance the string pointer but stay at the same pattern position — the * is still active and might consume more.
dp[i][j] = dp[i-1][j]
If either is true, dp[i][j] is true.
By allowing the second option (consume one character, stay on ), we get all the lengths that can match for free. Matching three characters is just "consume one, stay on *" three times in a row. Matching ten characters is "consume one, stay on *" ten times. The DP table naturally accumulates these chains, so we never have to enumerate the lengths explicitly.
That's the move that collapses exponential branching into two simple transitions. The brute force's for k from 0 to ... loop is replaced by a single OR of two cells.
Step 6: Initialize the boundary
Now we handle the base cases. dp[0][0] = true is easy. What about dp[0][j] for j > 0 — can an empty string match a non-empty pattern?
Yes, but only if the pattern is made entirely of *'s. Patterns like *,**,*** can all match the empty string because each * is allowed to match zero characters. Any other character in the pattern requires at least one character from the string, which we don't have.
The recurrence handles this for us. If p[j-1] == '*', then dp[0][j] = dp[0][j-1] (the zero-characters branch — the one-more-character branch requires i >= 1). That naturally propagates: dp[0][0] = true, so dp[0][1] = true if p[0] == '*'; then dp[0][2] = true if p[1] == '*' as well, and so on.
For dp[i][0] with i > 0 — a non-empty string against an empty pattern — the answer is always false, which is the default value of a boolean array.
Step 7: Walk through "adceb" vs "*a*b" to verify
Let's verify. The DP table (rows are s, columns are p):
"" * a * b
"" T T F F F
a F T T T F
d F T F T F
c F T F T F
e F T F T F
b F T F T TA few cells worth checking:
dp[0][1] = T: empty vs . case, zero characters: dp[0][0] = T. ✓
dp[1][1] = T: "a" vs . case, one more character: dp[0][1] = T. ✓
dp[1][3] = T: "a" vs a. case, zero: dp[1][2] = T (because a matches a after ate nothing). ✓
dp[4][3] = T: "adce" vs a. The trailing * is consuming dce. By the recurrence: zero gives dp[4][2] = F, one-more gives dp[3][3] = T. ✓
dp[5][4] = T: "adceb" vs ab. Non-* case (b == b): dp[4][3] = T. ✓
The recurrence holds. Notice how the * column "absorbs" any number of characters as we go down it — that's the two-transition trick doing its work.
Step 8: Complexity
The state has two indices, i and j, ranging up to m and n. That gives O(m × n) distinct subproblems. Each subproblem does O(1) work — at most a couple of character comparisons and table lookups. By the "number of states × work per state" rule, the total runtime is O(m × n) time and O(m × n) space for the DP table.
This progression — brute-force recursion → memoization → bottom-up 2D DP — is the same pattern that shows up across most 2D string DP problems. What's specific to Wildcard Matching is the trick in Step 5: collapsing the brute force's "try every length for *" loop into two transitions by allowing the * to either end (zero) or continue (one more). Once you see that move, the problem is mechanical.
Step 9: A quick note on the constant-space greedy solution
Before we move on to the trade-offs, there's one more approach worth mentioning: a greedy algorithm that solves the problem in O(1) extra space. It comes up often enough in follow-up questions that you should at least be familiar with the idea, even if you wouldn't lead with it.
The intuition: scan s and p together with two pointers. When characters match (literal or ?), advance both. When you hit a * in p, remember two things — the position of that * in the pattern, and the position in s where we started attempting to match characters after it. Then optimistically assume the * matches zero characters and advance the pattern pointer. If you later hit a mismatch, you "backtrack" to the saved *: bump the saved string position forward by one (the * now absorbs one more character), reset the pattern pointer to just after the *, and continue scanning from there.
In effect, each * becomes a checkpoint we can return to and stretch. Mismatches don't kill the algorithm — they just tell the most recent * to consume more. When we run out of string, we make sure any remaining pattern characters are all *.
Why might you want this version? Two reasons: it uses O(1) extra space instead of O(m × n), and it can be slightly faster in practice because it short-circuits on simple inputs.
Why wouldn't you lead with it? Because the bookkeeping is genuinely tricky under interview pressure, the correctness argument is harder to explain, and a small mistake in pointer management gives subtly wrong answers that are painful to debug. The DP version is the safer choice. If the interviewer asks "can you do this in constant space?", that's when you reach for the greedy version — and you'd want to walk through why the most recent * is always the right place to backtrack to before writing any code.
5. Discuss Trade-Offs Between Solutions
Approach | Time | Space | When I'd use it |
Brute-force recursion | Exponential | O(m+n) stack | Never — but worth mentioning to frame the problem |
Recursion + memoization | O(m × n) | O(m × n) | Easiest to write if I'm short on time and the transitions are fresh |
Bottom-up 2D DP | O(m × n) | O(m × n) | My default interview answer — predictable iteration order and easy to debug |
Greedy with backtracking pointers | O(m × n) worst case | O(1) | If asked about constant space — a well-known clever solution for this specific problem |
6. Pseudocode
m = length of s, n = length of p
dp = (m+1) x (n+1) boolean table, all false
dp[0][0] = true
# Empty string can only match all-* patterns
for j from 1 to n:
if p[j-1] == '*':
dp[0][j] = dp[0][j-1]
for i from 1 to m:
for j from 1 to n:
if p[j-1] == s[i-1] or p[j-1] == '?':
dp[i][j] = dp[i-1][j-1]
else if p[j-1] == '*':
# zero characters OR one more character
dp[i][j] = dp[i][j-1] or dp[i-1][j]
return dp[m][n]7. Edge Cases
Things to verify before claiming we're done:
Empty string, empty pattern → dp[0][0] = true. ✓
Empty string, pattern of all *s → boundary initialization propagates true across row 0. ✓
Empty string, pattern with non-* characters → dp[0][j] = false at the first non-*. ✓
Non-empty string, empty pattern → dp[i][0] = false by default. ✓
Multiple consecutive s (e.g., "*", "***") → equivalent to a single *; the recurrence handles them without special-casing.
Pattern "*" alone → matches any string of any length. ✓
Pattern of ?s with length matching the string → checks element-wise that lengths match. ✓
Long string with many small * groups → algorithm is O(m × n), no special performance traps.
The boundary initialization on row 0 is the case most candidates get wrong. Walking through it explicitly earns trust with the interviewer.
8. Full Code
public class WildcardMatching {
public static boolean isMatch(String s, String p) {
int m = s.length();
int n = p.length();
boolean[][] dp = new boolean[m + 1][n + 1];
dp[0][0] = true;
// Boundary: empty string can match a pattern only if every pattern
// character so far is '*' (each '*' matches zero characters).
for (int j = 1; j <= n; j++) {
if (p.charAt(j - 1) == '*') {
dp[0][j] = dp[0][j - 1];
}
}
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
char pc = p.charAt(j - 1);
char sc = s.charAt(i - 1);
if (pc == sc || pc == '?') {
// Literal or '?': consume one character on both sides
dp[i][j] = dp[i - 1][j - 1];
} else if (pc == '*') {
// '*' matches zero characters: skip past it in the pattern
// OR '*' matches one more character: advance s, stay on '*'
dp[i][j] = dp[i][j - 1] || dp[i - 1][j];
}
}
}
return dp[m][n];
}
}9. Test the Code
System.out.println(isMatch("aa", "a")); // false
System.out.println(isMatch("aa", "*")); // true
System.out.println(isMatch("cb", "?a")); // false
System.out.println(isMatch("adceb", "*a*b")); // true
System.out.println(isMatch("acdcb", "a*c?b")); // false
System.out.println(isMatch("", "")); // true
System.out.println(isMatch("", "***")); // true
System.out.println(isMatch("abc", "***")); // true
System.out.println(isMatch("abc", "a*c")); // trueThese hit the meaningful cases: failed direct match, the all-absorbing *, single-character ?, * absorbing variable spans, a tricky failure case, both-empty, empty-vs-stars, multiple-stars, and a normal * in the middle.
10. Key Lessons
When an operator can match a variable amount of input, the DP trick is to encode "consume more" as a transition that stays on the same pattern position. That single move covers every possible length the operator could match, replacing an O(n) brute-force loop with one cell lookup.
Two-string DP problems almost always have state dp[i][j] over prefixes, and the transitions almost always anchor on the last character of the pattern prefix. The pattern dictates the rule; the string is the data being consumed.
Boundary cases for prefix DP encode semantic content. Here, dp[0][j] captures the fact that all-* patterns can match the empty string. Skipping that initialization breaks inputs like ("", "***").
Wildcard Matching and Regular Expression Matching look similar but have different * semantics. Always re-read the problem when you see familiar symbols — the trap is assuming the rules are the same.
Greedy approaches to flexible-pattern matching usually fail because the right length for an early operator can only be determined by looking ahead. DP encodes all the lookahead simultaneously by trying every choice in the table.
The thing that makes Wildcard Matching click isn't the DP table — it's the realization that the variable-length * can be expressed with two fixed-shape transitions instead of an enumeration over lengths. Once you see that move, every other "variable-length operator" problem in DP starts to look familiar.
Good Luck and Happy Coding!
Comments