Longest Common Subsequence

Mar 5
8 min read

Reasoning Across Two Strings at Once

Longest Common Subsequence is a classic test of whether you can define a two-dimensional DP state and reason about choices across two inputs simultaneously. Get the state definition right and the solution writes itself. Get it wrong and you'll spin in circles.

Problem Statement

Given two strings text1 and text2, return the length of their longest common subsequence.

A subsequence is derived from a string by deleting some or no characters without changing the order of the remaining characters. Characters don't need to be contiguous.

Examples:

"abcde", "ace" → 3 (the subsequence "ace")
"abc", "abc" → 3
"abc", "def" → 0

1. Clarify Requirements Before Jumping Into Code

Before solving, restate the rules carefully.

Input: Two strings, text1 and text2.

Output: An integer - the length of the longest common subsequence.

Key clarifications:

Order matters, adjacency doesn't. "ace" is a subsequence of "abcde" even though the characters aren't consecutive.
Characters must appear in both strings in the same relative order.
Return the length only, not the subsequence itself.
Either string may be empty - the answer is 0.

This is about alignment across two sequences, not substring matching within one. That distinction matters for how we think about the state.

2. Identify the Category of the Problem

The moment we see "two sequences, find something optimal about their relationship," think 2D DP. One dimension for each string - each cell in the table represents a subproblem involving a prefix of each string.

This is one of the most foundational patterns in string DP. Solving it cleanly unlocks Edit Distance, Shortest Common Supersequence, and a family of sequence alignment problems.

3. Start With Brute Force to Build Intuition

The brute force solution generates all 2^m subsequences of text1, and for each one, checks whether it appears as a subsequence of text2.

For m = 20, that's over a million subsequences to generate and check. Completely infeasible.

But it reveals the core difficulty: choices in one string affect what's matchable in the other. If you decide to match text1[i] with text2[j], you've committed - everything after must come from later positions in both strings. You can't match text1[i] with something before j later on.

That entanglement is why a 1D state doesn't work here. We need to track our position in both strings simultaneously.

4. Brainstorm More Solutions

Step 1: Try to Write a Recursive Solution First

When a problem feels two-dimensional, recursion is a natural place to start. It often helps make the state and transitions obvious before you worry about memoization or tabulation.

So, given that we're currently comparing text1[i] and text2[j], there are only two outcomes -

Case 1 - The characters match: text1[i] == text2[j]. There's never a reason to skip a matching character when we can use it, since taking the match contributes 1 to the LCS length. So we take the match, then move both pointers forward: recurse on text1[i+1...] and text2[j+1...].

Case 2 - The characters don't match: We can't use both. There are two ways to handle this scenario:

Skip text1[i] and try to match text2[j] with something later in text1.
Skip text2[j] and try to match text1[i] with something later in text2.

We take the larger of the two.

That gives us a clean recursion:

lcs(i, j):
    if i == text1.length or j == text2.length: return 0
    if text1[i] == text2[j]:
        return 1 + lcs(i+1, j+1)
    else:
        return max(lcs(i+1, j), lcs(i, j+1))

This is correct. But like every recursion with overlapping subproblems, the time complexity is exponential without memoization - the same (i, j) pair gets recomputed from many different paths through the recursion tree.

Step 2: Add Memoization to Fix the Recursion

The recursive solution is correct, but it's doing redundant work. The same subproblem gets recomputed from multiple paths through the recursion tree. On long strings, this cascades into exponential time.

The fix is straightforward: cache results. The first time we compute lcs(i, j), store the answer. Every subsequent call with the same (i, j) returns the cached value immediately.

Integer[][] memo = new Integer[m + 1][n + 1];

int lcs(int i, int j) {
    if (i == text1.length() || j == text2.length()) return 0;
    if (memo[i][j] != null) return memo[i][j];

    if (text1.charAt(i) == text2.charAt(j)) {
        memo[i][j] = 1 + lcs(i + 1, j + 1);
    } else {
        memo[i][j] = Math.max(lcs(i + 1, j), lcs(i, j + 1));
    }
    return memo[i][j];
}

To identify the time complexity, we ask: how many distinct (i, j) pairs are there? i ranges from 0 to m and j from 0 to n. That's (m+1) × (n+1) possible states, and each is computed exactly once. So the runtime drops to O(m × n).

Also, note that our memo table is a grid. Every entry we compute corresponds to a cell in that grid. If you look at it that way, memoization and bottom-up DP are really the same solution from different directions - top-down fills cells on demand as the recursion needs them, while bottom-up fills them systematically from smallest to largest. The bottom-up approach has a nice advantage: it avoids recursion overhead entirely (and is usually what interviewers expect).

Step 3: Define the DP State

So instead of thinking about the problem top-down (from (0,0) to (m,n)), let's look at a bottom-up solution. We'll build answers for small prefixes and use them to answer larger ones.

Define:

dp[i][j] = length of the LCS of text1[0...i-1] and text2[0...j-1]

The 1-indexed offset is a deliberate choice. It gives us a clean base case. When i = 0 or j = 0, one of the strings is empty, so the LCS is 0. That means we initialize the entire first row and column to 0 for free, just by allocating a zero-filled array of size (m+1) × (n+1).

From there, transitions mirror the recursion exactly:

If text1[i-1] == text2[j-1]: dp[i][j] = dp[i-1][j-1] + 1
Otherwise: dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Step 4: Build Intuition With a Concrete Trace

Let's fill the table for text1 = "abcde", text2 = "ace":

     ""  a  c  e
""  [ 0  0  0  0 ]
a   [ 0  1  1  1 ]
b   [ 0  1  1  1 ]
c   [ 0  1  2  2 ]
d   [ 0  1  2  2 ]
e   [ 0  1  2  3 ]

Let's trace a few cells to build intuition:

dp[1][1]: text1[0]='a' vs text2[0]='a' - match! dp[0][0] + 1 = 1.
dp[2][1]: text1[1]='b' vs text2[0]='a' - no match. max(dp[1][1], dp[2][0]) = max(1, 0) = 1.
dp[3][2]: text1[2]='c' vs text2[1]='c' - match! dp[2][1] + 1 = 2.
dp[5][3]: text1[4]='e' vs text2[2]='e' - match! dp[4][2] + 1 = 3.

Answer: dp[5][3] = 3. The LCS is "ace".

Step 5: Space Optimization

When looking for optimizations in a bottom-up DP solution, look at the recurrence: dp[i][j] depends on dp[i-1][j-1], dp[i-1][j], and dp[i][j-1] - i.e., the cell diagonally above-left, directly above, and directly to the left. We are only ever looking one row back.

That means we can compress the 2D table into two 1D arrays - the current row and the previous row. Or with careful implementation, even a single array by processing in the right order and saving the diagonal value before overwriting.

int[] prev = new int[n + 1];
int[] curr = new int[n + 1];

for (int i = 1; i <= m; i++) {
    for (int j = 1; j <= n; j++) {
        if (text1.charAt(i-1) == text2.charAt(j-1)) {
            curr[j] = prev[j-1] + 1;       // diagonal
        } else {
            curr[j] = Math.max(prev[j], curr[j-1]);  // above or left
        }
    }
    int[] temp = prev;
    prev = curr;
    curr = temp;  // swap — reuse arrays
}
return prev[n];

This reduces space from O(m × n) to O(n) - just two rows. This is always worth mentioning in an interview, even if you implement the full table first.

5. Trade-Offs at a Glance

Approach	Time	Space	Notes
Brute force	O(2^m × n)	Large	Infeasible
Recursive with memoization	O(m × n)	O(m × n)	Intuitive; recursion overhead
2D bottom-up DP	O(m × n)	O(m × n)	Clean, standard, preferred
Space-optimized DP	O(m × n)	O(min(m, n))	Worth knowing; slightly harder to read

6. Pseudocode

m = text1.length
n = text2.length

dp = (m+1) × (n+1) grid, filled with 0
     (first row and column are base cases — already 0)

for i from 1 to m:
    for j from 1 to n:
        if text1[i-1] == text2[j-1]:
            dp[i][j] = dp[i-1][j-1] + 1    ← match: extend diagonal
        else:
            dp[i][j] = max(dp[i-1][j], dp[i][j-1])  ← skip one character

return dp[m][n]

7. Edge Cases

One or both strings empty: The entire first row and column are 0 by initialization. dp[m][n] will correctly return 0.
No common characters: "abc" and "def". Every cell takes the max(skip) branch - the table stays at 0 everywhere except it never grows above 0 from a match. Returns 0.
Identical strings: Every character matches along the diagonal. dp[i][i] = i for all i. Returns m = n.
Repeated characters: "aab" and "aba". The DP correctly handles this - each cell considers exactly the prefix characters at that position, so duplicates don't cause double-counting.

8. Full Solution

public class LongestCommonSubsequence {

    public static int longestCommonSubsequence(String text1, String text2) {
        int m = text1.length();
        int n = text2.length();

        // dp[i][j] = LCS length for text1[0..i-1] : text2[0..j-1]
        // first row and column are implicitly 0 
        int[][] dp = new int[m + 1][n + 1]; 

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
                    // match found: extend LCS
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    // skip
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }

        return dp[m][n];
    }
}

9. Test Your Solution

System.out.println(longestCommonSubsequence("abcde", "ace"));   // 3
System.out.println(longestCommonSubsequence("abc", "abc"));     // 3
System.out.println(longestCommonSubsequence("abc", "def"));     // 0
System.out.println(longestCommonSubsequence("", "abc"));        // 0
System.out.println(longestCommonSubsequence("abc", ""));        // 0
System.out.println(longestCommonSubsequence("bsbininm", "jmjkbkjkv")); // 1

10. Key Lessons for Future Problems

Two sequences mean two dimensions. Whenever a problem asks you to find something optimal about the relationship between two sequences - alignment, edit distance, common structure - the state almost always needs to track a position in each sequence simultaneously. That's a 2D table.
Prefixes are the right unit of thought. Don't think about individual characters in isolation. Think about text1[0...i-1] and text2[0...j-1] as complete objects with known LCS lengths. Each cell in the table answers the question "what's the best I can do with exactly these two prefixes?" and builds on smaller answers already computed.
Matching and skipping are the only two cases. Every 2D sequence DP reduces to this: if the current characters match, extend the diagonal; if they don't, take the best of skipping one or the other. This exact pattern appears in Edit Distance, Longest Common Substring, and many other problems. Internalize it here and you'll recognize it immediately elsewhere.
The recursion and the table are the same solution. The recursive formulation is often easier to derive; the table is often easier to implement and analyze. Practice translating fluently between them. Start with the recursion to get the transitions right, then build the table from there.
Know the space optimization, even if you implement the full table. Reducing from O(m × n) to O(n) by keeping only two rows is a standard follow-up. Mentioning it proactively - "I'd use the full table first for clarity, but this can be space-optimized to O(n) since each row only depends on the previous one" - signals that you understand the algorithm deeply, not just the implementation.

Longest Common Subsequence is foundational. The state definition, the two-case transition, and the prefix-based reasoning appear in dozens of problems. Solve this one cleanly and a whole family of harder problems becomes much more approachable.

Good Luck and Happy Coding!