Clone Graph: A Step-by-Step Interview Walkthrough

18 hours ago
11 min read

Clone Graph is a problem that looks like a routine traversal until you remember one thing: graphs can loop back on themselves. The interviewer is using this problem to test whether you can handle the three things that make graph copying genuinely tricky — cycles that would cause infinite recursion, shared neighbors that must not be duplicated, and the distinction between "same value" and "same object." Candidates who recognize that they need to track which originals they've already cloned — and key that tracking on object identity, not value — write a clean solution that handles every case uniformly. The signal here is whether you understand that copying a graph is fundamentally about maintaining a correspondence between originals and clones, not about traversal mechanics.

Deep-copying object graphs with shared references and cycles is something real systems do constantly. Serialization frameworks deep-copy object graphs while preserving shared references. Garbage collectors trace reference graphs that contain cycles. Compilers clone control-flow graphs during optimization passes. Version control systems copy commit DAGs. Any time you need to duplicate a linked structure where nodes can reference each other arbitrarily — including back-references and shared children — the same original-to-clone mapping technique applies.

Problem Statement

Given a reference to a node in a connected undirected graph, return a deep copy (clone) of the graph.

Each node contains an integer value and a list of its neighbors. The graph may contain cycles, and node values are unique.

Example:

a graph with four nodes arranged in a square — node 1 connects to 2 and 4, node 2 connects to 1 and 3, node 3 connects to 2 and 4, node 4 connects to 1 and 3.

Cloning from node 1 should reproduce the entire square with brand-new node objects.

1. Clarify Requirements Before Jumping Into Code

Before writing a single line, we should restate the problem in our own words and ask the kinds of questions that would change our approach.

Input: a reference to one node in a connected undirected graph.

Output: a deep copy of the entire graph — every node duplicated, every edge preserved.

Details to clarify:

Does "deep copy" mean no shared references with the original? Yes — every node in the clone must be a brand-new object. Touching the clone must never affect the original.
Can the graph contain cycles? Yes — this is the crux. Undirected edges mean if A points to B, B points back to A, so cycles are everywhere.
Are node values unique? Yes — but we shouldn't rely on values for identity tracking, as we'll discuss.
Is the graph connected? Yes — every node is reachable from the given starting node.
Can the input be null? Yes — handle the empty-graph case by returning null.

The thing to flag are the cycles. In a tree, you can recurse into children without worrying about coming back — there are no back-edges. In an undirected graph, every edge is effectively a back-edge, so naive recursion will bounce between neighbors forever. The entire difficulty of this problem is handling that safely.

2. Identify the Category of the Question

A few signals jump out:

We're traversing a graph (visiting every node, following edges).
The graph has cycles, so we need cycle-safe traversal.
We're building a parallel structure — for each original node, a corresponding clone.
Shared neighbors must map to the same clone, not duplicate clones.

That combination — graph traversal, cycle handling, building a parallel structure with a correspondence between old and new — places this in the graph traversal with memoization family. The core technique is a map from original nodes to their clones, which does double duty: it tells us what we've already cloned (preventing infinite loops) and it lets us reuse clones (preventing duplicates). Same family as Copy List with Random Pointer and various "deep copy this linked structure" problems.

3. Brute Force Solution

Let's try the most natural-looking approach and watch it fail. The obvious idea is to traverse the graph, create a new node for each node we encounter, and recursively clone the neighbors.

clone(node):
    newNode = new Node(node.val)
    for neighbor in node.neighbors:
        newNode.neighbors.add(clone(neighbor))
    return newNode

This works fine on a tree. But let's run it on a graph with a cycle — say, node 1 connects to node 2, and node 2 connects back to node 1. We call clone(1), which creates a new node and recurses into its neighbor clone(2). That creates another new node and recurses into its neighbor clone(1) — but this is the original node 1 again, so we create yet another clone of node 1 and recurse into clone(2) again. There's no signal for us to stop, giving us an infinite loop.

The brute force teaches us the central insight: we have to remember which nodes we've already cloned. When we encounter node 1 the second time, we shouldn't create a fresh clone and recurse — we should recognize "I've already made a clone of this node" and reuse it. That recognition is what breaks the cycle and prevents duplicate clones at the same time.

4. Brainstorm More Solutions

Step 1: What exactly does "cloning a graph" require?

Before designing the algorithm, let's pin down precisely what a correct clone has to satisfy. There are really just two requirements:

Every original node corresponds to exactly one clone. If we clone node 1 twice for example, we've broken the structure.
Every neighbor reference in the clone points to another clone, not to an original node. If clone-of-1's neighbor list contains an original node, we haven't deep-copied — we've created a clone that's still tangled up with the original graph.

Put those together and a requirement emerges: we need a correspondence between each original node and its unique clone. Whenever we're about to add a neighbor to a clone, we need to ask "what's the clone that corresponds to this original neighbor?" and use that.

Step 2: The correspondence is a map

That correspondence — a mapping of an original node to its clone — is exactly what a hash map gives us. Key: an original node. Value: the single clone that corresponds to it.

This map does two jobs at once, which is what makes it so effective:

It prevents infinite recursion. Before cloning a node, we check the map. If the node is already there, we've seen it before (we're in a cycle, or it's a shared neighbor), so we return the existing clone instead of recursing again.
It prevents duplicate clones. Because we always look up the map before creating a new clone, each original node gets cloned exactly once. Every subsequent encounter reuses the stored clone.

Notice that both problems — cycles and shared neighbors — have the same solution. They're really the same problem in disguise: "I've arrived at a node I've already processed." The map handles both uniformly.

Step 3: Why identity matters, not value

A subtle but critical point: the map's key has to be the node object itself (its identity), not its value.

It might be tempting to key the map on node.val, since values are unique in this problem. That would happen to work here. But it's the wrong mental model, and it would break the moment values weren't guaranteed unique. The thing we're tracking is "have I cloned this specific node object?" — that's a question about identity, not about value. Two nodes with the same value would be different nodes requiring different clones.

Keying on identity (the object reference) makes the algorithm correct regardless of whether values are unique. It also matches what we actually mean: we're tracking objects, not labels. In Java, a HashMap<Node, Node> keyed on the node reference does exactly this.

Step 4: Assemble the algorithm

Now the recursion writes itself:

clone(node):
    if node is null: return null
    if node is in map: return map[node] # already cloned — reuse it

    newClone = new Node(node.val)      # create the clone
    map[node] = newClone               # record it BEFORE recursing

    for neighbor in node.neighbors:
        newClone.neighbors.add(clone(neighbor))

    return newClone

The order of operations matters here, just as it did in Number of Islands. We put the new clone in the map before recursing into neighbors. Why? Because a neighbor might point back to the current node (that's the cycle). When the recursion comes back around to the current node, it needs to find the clone already in the map. If we recorded the clone only after cloning all neighbors, the recursion would reach the current node before it was recorded, create a duplicate, and we'd be back to infinite recursion.

So the sequence is: create the clone, register it immediately, then clone neighbors. Registering before recursing is what makes cycle handling work.

Step 5: Walk through a cycle to verify

Let's trace a two-node cycle: node 1 ↔ node 2 (each is the other's neighbor).

clone(1): not in map. Create clone-1, store {1 → clone-1}. Recurse into neighbor 2.
- clone(2): not in map. Create clone-2, store {1 → clone-1, 2 → clone-2}. Recurse into neighbor 1.
  - clone(1): already in map. Return clone-1 immediately — no new node, no further recursion.
- Back in clone(2): add clone-1 to clone-2's neighbors. Return clone-2.
Back in clone(1): add clone-2 to clone-1's neighbors. Return clone-1.

Final result: clone-1's neighbors = [clone-2], clone-2's neighbors = [clone-1]. The cycle is reproduced exactly, with two brand-new objects, and the recursion terminated because the second encounter with node 1 hit the map. The map is what turned a potential infinite loop into a clean two-node copy.

Step 6: DFS or BFS?

Both work, with the same complexity. The DFS version above is recursive and concise. A BFS version uses a queue and an explicit loop:

public Node cloneGraph(Node node) {
    if (node == null) return null;

    Map<Node, Node> map = new HashMap<>();
    map.put(node, new Node(node.val));
    Queue<Node> queue = new LinkedList<>();
    queue.offer(node);

    while (!queue.isEmpty()) {
        Node current = queue.poll();
        for (Node neighbor : current.neighbors) {
            if (!map.containsKey(neighbor)) {
                map.put(neighbor, new Node(neighbor.val));
                queue.offer(neighbor);
            }
            map.get(current).neighbors.add(map.get(neighbor));
        }
    }

    return map.get(node);
}

The structure is the same — a map from original to clone — but the traversal is iterative. The DFS version risks stack overflow on very deep graphs; the BFS version avoids that by using heap memory for the queue. In an interview, I'd lead with DFS for readability and mention BFS as the safer choice for large graphs.

Step 7: Complexity

Each node is created exactly once and each edge is traversed exactly once (or twice in an undirected graph, since each edge appears in two neighbor lists). So total time is O(n + e), where n is the number of nodes and e is the number of edges.

Space is O(n) for the map (one entry per node) plus O(n) for the recursion stack or BFS queue in the worst case. So O(n) auxiliary space overall.

This is optimal — we have to create every node and visit every edge at least once, so O(n + e) time is unavoidable.

5. Discuss Trade-Offs Between Solutions

Approach	Time	Space	When I'd use it
Naive recursion, no map	Infinite loop	—	Never — breaks on the first cycle
DFS with HashMap	O(n + e)	O(n)	My default interview answer — clean and intuitive
BFS with HashMap	O(n + e)	O(n)	When the graph might be deep enough to risk stack overflow

Both working approaches use the same map; the only difference is the traversal order. The map is the real solution — the choice between DFS and BFS is secondary.

6. Pseudocode

clone(node, map):
    if node is null:
        return null
    if node in map:
        return map[node]

    newClone = new Node(node.val)
    map[node] = newClone          # register BEFORE recursing

    for neighbor in node.neighbors:
        newClone.neighbors.add(clone(neighbor, map))

    return newClone

7. Edge Cases

Things to verify before claiming we're done:

Null input → return null immediately. ✓
Single-node graph with no neighbors → create one clone, no recursion, return it. ✓
Single node with a self-loop (node is its own neighbor) → the map registration before recursing handles it; when we recurse into the self-neighbor, the node is already mapped, so we reuse the clone. ✓
Graph with cycles → the map breaks the cycle on the second encounter, as traced above.
Shared neighbors (two nodes pointing to the same third node) → the third node is cloned once; both clones reference the same clone-of-third. The map ensures this.
Dense graph (many edges) → still O(n + e); no special handling needed.

The self-loop case is a nice one to mention in an interview, because it's where "register before recursing" earns its keep most visibly — without it, a self-loop would recurse infinitely on the very first node.

8. Full Code

import java.util.*;

public class CloneGraph {

    static class Node {
        public int val;
        public List<Node> neighbors;

        public Node(int val) {
            this.val = val;
            this.neighbors = new ArrayList<>();
        }
    }

    // DFS with a map from original node -> clone.
    // The map does double duty: it breaks cycles (by returning 
    // an existing clone instead of recursing) and prevents 
    // duplicate clones.
    public static Node cloneGraph(Node node) {
        return clone(node, new HashMap<>());
    }

    private static Node clone(Node node, Map<Node, Node> map) {
        if (node == null) {
            return null;
        }

        // Already cloned this node — reuse it. 
        // This is what breaks cycles.
        if (map.containsKey(node)) {
            return map.get(node);
        }

        // Create the clone and register it BEFORE recursing, so 
        // that a neighbor pointing back to this node finds the 
        // clone already there.
        Node cloneNode = new Node(node.val);
        map.put(node, cloneNode);

        for (Node neighbor : node.neighbors) {
            cloneNode.neighbors.add(clone(neighbor, map));
        }

        return cloneNode;
    }

    // BFS alternative — same O(n + e) complexity, 
    // avoids recursion depth.
    public static Node cloneGraphBFS(Node node) {
        if (node == null) return null;

        Map<Node, Node> map = new HashMap<>();
        map.put(node, new Node(node.val));
        Queue<Node> queue = new LinkedList<>();
        queue.offer(node);

        while (!queue.isEmpty()) {
            Node current = queue.poll();
            for (Node neighbor : current.neighbors) {
                if (!map.containsKey(neighbor)) {
                    map.put(neighbor, new Node(neighbor.val));
                    queue.offer(neighbor);
                }
                // Link the current clone to the neighbor's clone
                map.get(current).neighbors.add(map.get(neighbor));
            }
        }

        return map.get(node);
    }
}

9. Test the Code

Node node1 = new Node(1);
Node node2 = new Node(2);
Node node3 = new Node(3);
Node node4 = new Node(4);

node1.neighbors.add(node2);
node1.neighbors.add(node4);
node2.neighbors.add(node1);
node2.neighbors.add(node3);
node3.neighbors.add(node2);
node3.neighbors.add(node4);
node4.neighbors.add(node1);
node4.neighbors.add(node3);

Node cloned = cloneGraph(node1);

System.out.println(cloned.val);                  // 1
System.out.println(cloned.neighbors.size());     // 2

// Verify the clone is a DIFFERENT object than the original
System.out.println(cloned == node1);             // false

// Verify the cycle is preserved: clone-1's neighbor's neighbor includes clone-1
Node clonedNeighbor = cloned.neighbors.get(0);
System.out.println(clonedNeighbor.neighbors.contains(cloned));  // true

// Single node, no neighbors
Node solo = new Node(99);
Node clonedSolo = cloneGraph(solo);
System.out.println(clonedSolo.val);              // 99
System.out.println(clonedSolo == solo);          // false

// Null input
System.out.println(cloneGraph(null));            // null

These hit the meaningful cases: a graph with cycles (verifying both structure and object distinctness), a single isolated node, and the null input. The most important checks are the identity assertions (cloned == node1 should be false) — they verify we actually deep-copied rather than returning the original.

10. Key Lessons

When deep-copying a linked structure (graph, list with cross-references, object graph), maintain a map from each original to its clone. That single map solves two problems at once — it breaks cycles and it prevents duplicate clones — because both are really the same problem: "I've arrived at a node I've already processed."
Track identity, not value. The map should be keyed on the node object itself, not its value. Keying on value happens to work when values are unique, but it's the wrong mental model and breaks the moment values can repeat. You're tracking "have I seen this object," which is a question about identity.
When cloning with recursion, register the clone in the map before recursing into neighbors. A neighbor may point back to the current node; if the clone isn't registered yet, that back-reference triggers a duplicate clone and infinite recursion. Register first, recurse second.
Avoid persistent shared state (like a static map field) when the function might be called more than once. Pass the state in as a parameter or reset it per call, so a second invocation doesn't inherit stale data from the first.
DFS and BFS both work for graph traversal; they differ only in order and in recursion-depth risk. The real engine of this solution is the original-to-clone map, not the traversal strategy.

The thing that makes Clone Graph click is recognizing that cloning a structure with cycles and shared references is fundamentally about maintaining a correspondence between originals and clones, and that one map keyed on identity handles cycles and duplicates in a single stroke. Once you internalize "deep copy means build a map from old to new," a whole family of structure-copying problems collapses into the same pattern.

Good Luck and Happy Coding!