Graph Valid Tree: A Step-by-Step Interview Walkthrough
- 10 hours ago
- 12 min read
Graph Valid Tree looks like a quick traversal problem and turns out to be a test of whether you actually know what a tree is. Most candidates can run a DFS, but this problem rewards understanding the structural definition of a tree — connected and acyclic — and recognizing that those two conditions, combined with the right invariant, collapse into a surprisingly clean check. The candidates who memorized "trees use DFS" tend to write a traversal and forget one of the two conditions, passing some test cases and failing others. The candidates who reason about what must be true of a tree spot the powerful shortcut: a graph on n nodes is a tree if and only if it has exactly n - 1 edges and no cycles (or equivalently, n - 1 edges and is connected). The signal here is whether you can reason about graph properties rather than just running an algorithm and hoping.
Validating tree structure is a real operation across many systems. Network designers verify that a spanning topology has no redundant loops (which waste bandwidth or cause broadcast storms). Distributed systems check that a set of nodes forms a connected, loop-free overlay. Dependency analyzers confirm that a module hierarchy is acyclic and fully linked. Filesystem and data-structure validators check that pointers form a tree rather than a tangled graph. Any time you need to confirm that a set of connections forms exactly a tree — fully connected, no redundancy — you're running this check.
Problem Statement
You are given n nodes labeled from 0 to n - 1 and a list of undirected edges.
Return true if these edges form a valid tree, and false otherwise.
A valid tree must be connected (every node reachable from every other) and acyclic (contain no cycles).
Example:
n = 5, edges = [[0,1],[0,2],[0,3],[1,4]]
result → true.
n = 3, edges = [[0,1],[1,2],[2,0]]
result → false (the cycle 0–1–2–0).
1. Clarify Requirements Before Jumping Into Code
Before writing a single line, we should restate the problem in our own words and ask the kinds of questions that would change our approach.
Input: an integer n (number of nodes) and a list of undirected edges.
Output: a boolean — is this a valid tree?
Details to clarify:
What two conditions define a valid tree? It must be connected (one single component) and acyclic (no cycles). Both must hold.
Is the graph undirected? Yes — edge [u, v] connects u and v in both directions.
Are self-loops or duplicate edges possible? Assume the input is clean (no self-loops, no duplicate edges) unless the interviewer says otherwise — though it's worth asking, because duplicates would form trivial cycles.
What about n = 1 with no edges? A single node with nothing else is a valid (trivial) tree.
The thing to flag is that both conditions matter, and failing either one disqualifies the graph. Keeping both conditions in view is what prevents the common bug of checking only one. As we'll see, there's an invariant that lets us enforce both at once.
2. Identify the Category of the Question
A few signals jump out:
We have an undirected graph.
We need to check two structural properties: connectivity and acyclicity.
We're not finding a path or a distance — we're validating structure.
That combination — undirected graph, check connectivity and cycles — points to two natural toolsets: graph traversal (DFS/BFS to check both properties) or Union-Find (Disjoint Set Union, which tracks connectivity and detects cycles as you merge nodes). Same family as Number of Islands (connected components) and any "are these things all linked without redundancy?" problem. The key realization, which we'll build in the brainstorm, is an edge-count invariant that simplifies everything.
3. Brute Force Solution
Let's think about the most natural first approach to understand the problem. We have two conditions to verify — connected and acyclic — so the direct idea is to check each one with a standard traversal.
Start with connectivity: run a DFS or BFS from node 0, marking every node we can reach. When the traversal finishes, if we've visited all n nodes, the graph is connected; if some node was never reached, it's disconnected and we can reject it.
Now acyclicity: while we're traversing, we also watch for cycles. In an undirected graph, that means checking whether we ever reach an already-visited node that isn't the one we just came from — if we do, there's a redundant path, which is a cycle.
This is a perfectly reasonable solution, and in fact it's a legitimate way to solve the problem in O(n + e) time. It's worth keeping in our back pocket. But notice it requires juggling two things at once — a full-coverage check and careful undirected cycle detection with parent tracking — and the cycle logic has a fiddly edge case (don't mistake the edge back to your parent for a cycle). Before committing to it, it's worth asking: is there a single property that captures "tree-ness" more directly, so we don't have to verify connectivity and acyclicity as two separate concerns? That question leads us to a powerful invariant.
4. Brainstorm More Solutions
Step 1: What must be true of any tree?
Let's reason from the definition rather than jumping to an algorithm. A tree is connected and acyclic. Is there anything else we can derive that must always hold?
Here's a classic fact worth re-deriving. Consider building a tree by adding nodes one at a time. Start with a single node and no edges. To connect a second node without creating a cycle, you add exactly one edge. To connect a third node, one more edge. Every new node requires exactly one new edge to attach it to the existing structure — any more than one edge to a node would create a redundant path (a cycle), and any fewer (zero) would leave it disconnected.
So a tree with n nodes has exactly n - 1 edges. Not n - 2 (something would be disconnected), not n (something would form a cycle). Exactly n - 1. This is a hard invariant: every tree satisfies it, and it's a necessary condition.
Step 2: Is n - 1 edges sufficient?
A natural question: if a graph has exactly n - 1 edges, is it automatically a tree? Let's test that.
Consider the example of [[0,1],[1,2],[2,0]] plus an isolated node 3. That's n = 4 with 3 edges (n - 1), but nodes 0–1–2 form a cycle and node 3 is disconnected. So n - 1 edges alone does not guarantee a tree.
But notice something interesting about that counterexample: it failed both ways at once — it had a cycle and a disconnection. That's not a coincidence. But let's not take that on faith — let's reason out why pinning the edge count at n - 1 ties the two conditions together.
First, picture a graph with n - 1 edges that has no cycle. Could it be disconnected? If it were, it would split into two or more separate pieces, each itself cycle-free. As we derived earlier, a connected cycle-free piece with n nodes needs exactly n - 1 edges to hold it together. If there are two or more pieces, the total comes to n - 2 or fewer, not n - 1. So having all n - 1 edges with no cycle leaves no room to be disconnected — the graph must be connected.
Now run it the other way: a graph with n - 1 edges that is connected. Could it have a cycle? A connected graph on n nodes needs at least n - 1 edges just to link everyone. If it also contained a cycle, that cycle would include a redundant edge — one you could delete while staying connected — meaning you'd need more than n - 1 edges to both connect everyone and afford the redundancy. With exactly n - 1 and not one to spare, there's no room for a redundant edge, so there can be no cycle — the graph must be acyclic.
So once the edge count is fixed at n - 1, connectivity and acyclicity stop being independent: each one forces the other. And that means we only have to verify one of them.
Step 3: The two-part check
This gives us a clean strategy:
Check the edge count. If the number of edges isn't exactly n - 1, it can't be a tree — return false immediately. This single check eliminates a huge class of invalid inputs. Too many edges means there's a guaranteed cycle; too few means we are guaranteed disconnection.
With the count confirmed, check just one remaining condition. We'll check for cycles. If there's no cycle and we have n - 1 edges, connectivity is guaranteed, so it's a valid tree.
Now we need an efficient way to detect cycles in an undirected graph. We have two good options.
Step 4: Option A — traversal with parent tracking
We can DFS or BFS from node 0, marking nodes visited as pass them. To detect a cycle in an undirected graph, there's a subtlety: every edge is bidirectional, so when we're at node v having come from node u, we'll see the edge back to u — but that's not a cycle, just the edge we arrived on. A real cycle is finding an already-visited node that isn't the parent we came from.
So we track the parent: when exploring v's neighbors, we skip the neighbor that is v's parent. If we ever reach an already-visited node that isn't the parent, we've found a genuine cycle.
After the traversal, if every node was visited (and no cycle was found), it's a tree.
This works, but the parent-tracking detail is easy to get wrong, and we'd still have to confirm all nodes were visited afterwards.
Step 5: Option B — Union-Find
There's a cleaner approach that maps directly onto the structure. Think about what each edge does: it connects two nodes, merging whatever groups they belonged to. If we process edges one at a time and track which nodes are in the same connected group, a cycle reveals itself in a specific way.
Reason it out: when we process an edge (u, v), there are two cases. Either u and v are in different groups — in which case this edge usefully merges them — or u and v are already in the same group, meaning there was already a path between them. Adding another edge between two already-connected nodes creates a second path between them, which is exactly a cycle.
So the cycle test is: for each edge, if its two endpoints are already in the same group, it's a cycle — return false. Otherwise, merge their groups.
This is Union-Find (Disjoint Set Union): a structure that efficiently tracks group membership (find) and merging (union). Each edge either merges two groups or reveals a cycle, and the merging naturally tracks connectivity as a side effect.
This is the same Union-Find technique that powers dynamic connected-components problems (like the follow-up to Number of Islands). Here it's a perfect fit because the problem is fundamentally about which nodes are connected to which — exactly what Union-Find tracks.
Step 6: Putting it together
With the edge-count check up front, Union-Find gives us a tight solution:
If edges.length != n - 1, return false.
Initialize Union-Find with each node in its own group.
For each edge, if its endpoints are already in the same group, return false (cycle). Otherwise, union them.
If we process all edges without finding a cycle, return true.
Because we've already guaranteed n - 1 edges, "no cycle found" implies "connected," so we don't need a separate connectivity pass. The edge count plus the cycle check together certify both tree conditions.
Step 7: Complexity
Let n be the number of nodes and e the number of edges (which we've pinned at n - 1, so e = n - 1).
The edge-count check is O(1). The Union-Find solution processes each edge once, performing a find and possibly a union. With path compression and union by rank (both in the code below), each of these operations runs in O(α(n)) amortized time, where α is the inverse Ackermann function — effectively a small constant for any realistic input. So processing all e edges is O(e × α(n)) = O(n α(n)), which is essentially O(n).
Space is O(n) for the Union-Find parent and rank arrays.
The traversal approach (Option A) is O(n + e) = O(n) time and O(n) space, comparable. Both are linear; the choice is about which is cleaner to reason about, and for a structural problem like this, Union-Find maps most directly onto the question being asked.
5. Discuss Trade-Offs Between Solutions
Approach | Time | Space | When I'd use it |
DFS/BFS with parent tracking | O(n + e) | O(n) | Works well; needs careful undirected cycle logic and a visited-all check |
Union-Find | O(n α(n)) | O(n) | My default — maps directly onto the connectivity/cycle structure |
Both real approaches are linear and correct. I lean toward Union-Find because the problem is fundamentally about connectivity and redundancy, which is exactly what Union-Find models — the cycle check and connectivity tracking fall out of the same union operation.
6. Pseudocode
if number of edges != n - 1:
return false # wrong edge count → can't be a tree
initialize union-find: each node is its own group
for each edge (u, v):
if find(u) == find(v):
return false # endpoints already connected → cycle
union(u, v)
return true # n-1 edges + no cycle ⇒ connected and acyclic7. Edge Cases
Things to verify before claiming we're done:
n == 1, no edges → edge count is 0 = n - 1. Loop doesn't run. Return true (a single node is a trivial tree). ✓
Disconnected graph with correct edge count → impossible without a cycle elsewhere, so the cycle check catches it. (E.g., n=4, edges [[0,1],[1,2],[2,0]] has 3 edges but a cycle in 0–1–2 and an isolated node 3 — the cycle check returns false.) ✓
Cycle formed early → the union that would close the loop finds both endpoints already connected, returns false immediately.
Too many edges (> n - 1) → caught by the edge-count check before any Union-Find work.
Too few edges (< n - 1) → also caught by the edge-count check; the graph can't be connected.
The edge-count check at the top is what makes everything else clean. It instantly rejects the "wrong number of edges" cases, leaving only the n - 1-edge graphs where a single cycle check settles the question.
8. Write Full Code
public class GraphValidTree {
static class UnionFind {
int[] parent;
int[] rank;
UnionFind(int n) {
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) {
parent[i] = i; // each node starts as its own group
}
}
// Find with path compression:
// flattens the tree for fast future lookups
int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}
// Union by rank. Returns false if x and y were already
// connected (which, for this problem, means we just found
// a cycle).
boolean union(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX == rootY) return false; // already connected → cycle
if (rank[rootX] < rank[rootY]) {
parent[rootX] = rootY;
} else if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else {
parent[rootY] = rootX;
rank[rootX]++;
}
return true;
}
}
public static boolean validTree(int n, int[][] edges) {
// A tree on n nodes must have exactly n - 1 edges
if (edges.length != n - 1) return false;
UnionFind uf = new UnionFind(n);
for (int[] edge : edges) {
// If union returns false, the endpoints were already
// connected: this edge closes a cycle, so it's not a
// tree.
if (!uf.union(edge[0], edge[1])) {
return false;
}
}
// n - 1 edges and no cycle guarantees connectivity,
// so it's a tree.
return true;
}
}9. Test the Code
// Valid tree
int[][] e1 = {{0,1},{0,2},{0,3},{1,4}};
System.out.println(validTree(5, e1)); // true
// Cycle: 0-1-2-0
int[][] e2 = {{0,1},{1,2},{2,0}};
System.out.println(validTree(3, e2)); // false
// Disconnected: two separate edges, only 2 edges for 4 nodes (n-1 = 3)
int[][] e3 = {{0,1},{2,3}};
System.out.println(validTree(4, e3)); // false (wrong edge count)
// Single node, no edges — trivial tree
System.out.println(validTree(1, new int[][]{})); // true
// Correct edge count but cycle + disconnection
int[][] e5 = {{0,1},{1,2},{2,0}};
System.out.println(validTree(4, e5)); // false (3 edges = n-1, but cycle + node 3 isolated)
// A simple two-node tree
int[][] e6 = {{0,1}};
System.out.println(validTree(2, e6)); // trueThese hit the meaningful cases: a valid tree, a pure cycle, a disconnected graph caught by edge count, the single-node trivial tree, the tricky "right edge count but cycle and disconnection" case (which proves why the cycle check is still needed even with the count correct), and a minimal two-node tree.
10. Key Lessons
A tree on n nodes has exactly n - 1 edges — no more, no less. This invariant is one of the most useful facts in graph problems. Check it first; it instantly rejects a huge class of invalid inputs.
With the edge count pinned at n - 1, "acyclic" and "connected" become equivalent — proving one gives you the other for free. Recognizing this lets you check a single condition instead of two, which simplifies the whole solution.
Union-Find is the natural tool when a problem is about connectivity and redundancy. Each edge either merges two groups or, if its endpoints are already connected, reveals a cycle. Connectivity tracking and cycle detection fall out of the same union operation.
Cycle detection in an undirected graph (if you go the DFS route) requires parent tracking — an edge back to the node you just came from isn't a cycle. This is a classic off-by-one-style trap; Union-Find sidesteps it entirely.
Reason about what must be true of a structure before reaching for a traversal. The fastest solution here came not from exploring the graph but from a counting invariant plus a single targeted check. "What properties must hold?" often beats "let me traverse and see."
The thing that makes Graph Valid Tree click is the structural reasoning that a tree is precisely "n - 1 edges plus no cycle," and that the edge count makes connectivity and acyclicity two sides of the same coin. Once you stop thinking "how do I explore this graph?" and start thinking "what must be true for this to be a tree?", the problem collapses into a one-line invariant check followed by a clean cycle test. Training yourself to reason about properties before reaching for algorithms is the habit that turns structural graph problems from fiddly to obvious.
Good Luck and Happy Coding!
Comments