Serialize and Deserialize Binary Tree: A Step-by-Step Interview Walkthrough
- May 23
- 11 min read
Serialize and Deserialize Binary Tree is one of the rare interview problems that's actually a design problem. There's no fixed answer — you get to choose your own encoding, and the interviewer is watching to see whether your choice is principled or arbitrary. Candidates who pick a format without thinking about reversibility get stuck halfway through deserialization when they realize the encoding doesn't carry enough information. But if you can reason about what's preserved and what's lost in each format choice — values, structure, null positions — then your pick of representation makes deserialization fall out naturally. The signal here is whether you can think about data as having structure that needs explicit encoding, not just values that get written down.
Tree serialization is something production systems do constantly. Databases serialize execution plans and B-tree pages for persistence. Compilers serialize ASTs for incremental builds and language server protocols. Distributed systems serialize Merkle trees for replication. Game engines serialize scene graphs for save states. UI frameworks serialize component trees for hot reloading and time-travel debugging. Whenever a hierarchical structure needs to cross a process boundary, hit a disk, or be transmitted over a network, the same design questions you're solving here come up — and the choices you make about null markers and traversal order have direct consequences for the format's compactness and reconstructability.
Problem Statement
Design an algorithm to serialize and deserialize a binary tree.
Serialization is the process of converting a binary tree into a string so it can be stored or transmitted. Deserialization is the process of converting that string back into the original binary tree.
You may assume:
Node values are integers.
The tree can be empty.
The serialized format only needs to be understood by your own deserializer (no interop required).
Example:
1
/ \
2 3
/ \
4 5A valid serialization (one of many) is "1,2,#,#,3,4,#,#,5,#,#".
1. Clarify Requirements Before Jumping Into Code
Before writing a single line, we should restate the problem in our own words and ask the kinds of questions that would change our approach.
Input for serialize: the root of a binary tree (may be null).
Output for serialize: a string.
Input for deserialize: a string produced by serialize.
Output for deserialize: the root of the reconstructed tree.
Details to clarify:
Is this a BST or a general binary tree? General binary tree — no value-based ordering to exploit.
Are node values bounded? Usually integers fitting in int, but worth confirming. Negative values are allowed.
Can the tree be empty? Yes.
Does the format need to be human-readable or compact? Either is fine; the interviewer probably cares most about correctness and reversibility, not bytes saved.
Can values contain characters that conflict with our chosen delimiter? Worth thinking about. We'll pick a delimiter and assume integer values, so commas and # don't collide.
The thing to flag is the contract: Serialize and deserialize must be perfect inverses. deserialize(serialize(T)) has to reproduce the exact same tree structure as T. That's not just "same values" — it's "same shape." Every choice we make about encoding has to be evaluated against that contract.
2. Identify the Category of the Question
A few signals jump out:
The problem has two functions that must be inverses of each other.
We get to choose the format, so the design itself is part of the answer.
Trees are hierarchical, so the format needs to preserve hierarchy.
Reconstruction is a parsing problem — given the format, we need to rebuild structure from a linear string.
That combination — encoding + decoding, structural data, format design — places this in the tree traversal + encoding family. The same techniques apply to encoding any hierarchical structure (file system paths, expression trees, JSON-like documents).
3. Brute Force Solution
Let's try the most obvious approach and see why it fails. The simplest idea: traverse the tree, write down each node's value separated by commas, and call it done.
serialize: "1,2,3,4,5"Now try to deserialize that. We get back the values 1, 2, 3, 4, 5, but... in what shape? Is 1 the root with 2, 3 as children and 4, 5 as grandchildren? Is 1 the root of a left-leaning chain? Is 1 the root with 2 as a left child, 3 as 2's right child, and so on?
Without more information, we have no idea. Many different tree structures produce the same in-order, pre-order, or post-order traversal of values. The naive encoding loses structure.
The brute force teaches us the key insight: values alone don't preserve structure. If we want deserialization to recover the original tree, the serialization has to encode not just what the values are, but also where they sit in the structure. The most direct way to do that is to record nulls — when a child slot is empty, we have to say so.
4. Brainstorm More Solutions
Step 1: Why nulls matter
Let's see concretely how recording nulls fixes the ambiguity. Consider these two trees:
1 1
/ \
2 2Both have the same values in the same preorder visit order: [1, 2]. Without null markers, they serialize identically. With null markers, the left-skewed tree becomes [1, 2, null, null, null] (1 has left child 2, 2 has no left, 2 has no right, 1 has no right), and the right-skewed tree becomes [1, null, 2, null, null] (1 has no left, 1 has right child 2, 2 has no children).
Different strings, unambiguous reconstruction. The null markers act as structural padding — they tell the deserializer where children are missing. Without them, the deserializer can't tell where one node's subtree ends and the next node's begins.
Step 2: Which traversal order should we use?
Once we've committed to recording nulls, we need to pick a traversal order. It would be helpful to think through all the standard approaches for traversing a tree and think the pros/cons of each option.
Inorder (left, node, right) is tempting because BSTs have a special relationship with inorder traversals. But for a general binary tree, inorder traversal doesn't work cleanly with null markers. When we encounter a node during deserialization, we don't know whether the next token is its left child or its right child without more context. Inorder is the wrong shape for this problem.
Postorder (left, right, node) has a similar issue but in reverse — we'd encounter children before their parents, which means we'd build leaves first and assemble them upward. That's workable but awkward, because we don't know how children should be combined into parents without an explicit "this is a parent" marker.
Preorder (node, left, right) is the natural fit. Why? Because every time we read a token, we know what it represents immediately: either a node (in which case its children follow) or a null marker (in which case nothing follows). The deserializer can recursively rebuild the tree by reading one token, deciding what it is, and recursing into the children if it's a node.
Level-order (BFS) also works, and it's arguably the most "human-readable" since you can visualize the levels. The trade-off is implementation complexity — you need a queue on both ends, and the format is more verbose for sparse trees because every internal level has to be fully padded with nulls.
I'd reach for preorder. It maps cleanly onto recursion, makes the serializer and deserializer mirror each other exactly, and produces a compact encoding even for sparse trees.
Step 3: Design the serializer
Preorder traversal with null markers. For each node:
If the node is null, write a sentinel like #.
Otherwise, write the node's value, then recursively serialize the left subtree, then the right subtree.
Separate tokens with a delimiter (comma works well — integer values don't contain commas).
serialize(node):
if node is null:
append "#"
return
append node.val
serialize(node.left)
serialize(node.right)For our example tree above, this produces "1,2,#,#,3,4,#,#,5,#,#". Each # marks a null position; each number marks a real node.
Step 4: Design the deserializer to mirror the serializer
Here's the critical move: the deserializer should mirror the serializer's structure exactly. Whatever order the serializer wrote things in, the deserializer reads in the same order.
For preorder serialization, that means:
Read the next token.
If it's #, return null.
Otherwise, create a node with the parsed value, then recursively build the left subtree, then the right subtree, then return the node.
deserialize():
token = next from stream
if token == "#": return null
node = new TreeNode(parseInt(token))
node.left = deserialize()
node.right = deserialize()
return nodeNotice how the structure of deserialize is identical to serialize, just with reads instead of writes and node creation instead of node traversal. That's not a coincidence — it's the whole point. When two operations are inverses, their code structures should mirror each other. If your deserialize looks structurally different from your serialize, that's a sign something is off.
The recursive structure looks clean, but let me try to actually implement it and see if anything breaks. The deserializer reads "the next token" — but where does that next token come from?
It can't be that each recursive call starts over from the beginning of the string because the string itself never changes between calls. There's also no shared "where am I now" variable — every call sees the same input and reads the same first token.
So the question becomes: how does each call know which token is next relative to the previous call? Whatever data structure we pass in, the recursive calls need to advance through it. Reading a token has to consume that token, so the next call sees a different first token than this one did.
A few options for fixing this. Each one is essentially a way to maintain a "current position" that all recursive calls share:
Pass a mutable position alongside an immutable string. Each call reads the token at the current position and increments the position. This works, but it requires either a mutable wrapper or returning the new position from each call.
Use a queue. Pre-split the string into a queue of tokens; each recursive call calls poll() to remove the front token. The queue is shared by reference across all calls, so consuming a token in one call leaves it consumed for all subsequent calls. The next call's poll() returns the token after the one this call just consumed.
Use an iterator. Similar to a queue — each next() call advances the iterator, and the state lives in the iterator object that all recursive calls share.
All three solve the same problem: making token consumption stateful and shared. I'd reach for the queue because it's the most readable and the poll() semantics match what we want intuitively — "give me the next token and consume it."
The fix in code:
deserialize(tokens): # tokens is a shared queue
token = tokens.poll()
if token == "#": return null
node = new TreeNode(parseInt(token))
node.left = deserialize(tokens)
node.right = deserialize(tokens)
return nodeThe recursion structure is unchanged. The only difference is that tokens is now a queue passed by reference, and poll() mutates it. After the left subtree's deserialization finishes, every token belonging to that subtree has been consumed from the queue, so the right subtree's call naturally picks up at the right place.
This is the kind of detail that's easy to miss when writing the algorithm in your head. Whenever a recursive algorithm reads from an input that needs to be consumed positionally — tokens, characters from a stream, indices into an array — the recursive calls have to share state about where they are in that input.
Step 5: Complexity
Serialization visits each node exactly once and writes O(1) tokens per node (one value token, plus null markers for empty child slots). Total characters written is O(n), so time and output size are both O(n).
Deserialization reads each token exactly once and does O(1) work per token. Time is O(n). Space is O(h) for the recursion stack, where h is the tree's height — O(log n) for balanced trees, O(n) for skewed ones. The token queue itself is O(n).
Both directions are linear, which is essentially optimal — we have to look at every node at least once.
5. Discuss Trade-Offs Between Solutions
Approach | Time | Space | When I'd use it |
Values only, no nulls | O(n) | O(n) | Never — loses structure, fails the contract |
Level-order (BFS) with nulls | O(n) | O(n) | Reasonable choice; output is verbose but readable |
Preorder DFS with nulls | O(n) | O(n) | My default interview answer — clean recursion, mirror-image serializer and deserializer |
Postorder DFS with nulls | O(n) | O(n) | Works but reconstruction is less intuitive |
Preorder is the right answer for this problem. The serializer and deserializer have identical structures, which is both easier to write and easier to convince an interviewer is correct.
6. Pseudocode
serialize(node, output):
if node is null:
append "#" to output
return
append node.val to output
serialize(node.left, output)
serialize(node.right, output)
deserialize(tokens):
token = consume next token
if token == "#":
return null
node = new TreeNode(parseInt(token))
node.left = deserialize(tokens)
node.right = deserialize(tokens)
return node7. Edge Cases
Things to verify before claiming we're done:
Empty tree → serialize produces "#"; deserialize reads "#" and returns null. ✓
Single-node tree → serialize produces "42,#,#"; deserialize reads three tokens and produces a leaf. ✓
Completely skewed tree (all left or all right) → algorithm works without any special handling; just deeper recursion.
Tree with negative values → values serialize as "-5", which parses fine with Integer.parseInt.
Tree with duplicate values → values are written as-is; the structure tokens (# for null) are what makes the encoding unambiguous, not the values themselves.
Very deep trees → recursion depth equals tree height; for adversarial inputs, an iterative version with an explicit stack would be safer.
The null marker handles every shape uniformly. That's the sign the encoding is well-designed.
8. Full Code
import java.util.*;
public class SerializeDeserializeBinaryTree {
static class TreeNode {
int val;
TreeNode left;
TreeNode right;
TreeNode(int val) {
this.val = val;
}
}
// Serialize: preorder DFS with '#' for null children.
// Each node's value is followed by its left subtree's encoding,
// then its right subtree's encoding.
public static String serialize(TreeNode root) {
StringBuilder sb = new StringBuilder();
serializeHelper(root, sb);
return sb.toString();
}
private static void serializeHelper(TreeNode node, StringBuilder sb) {
if (node == null) {
sb.append("#,");
return;
}
sb.append(node.val).append(",");
serializeHelper(node.left, sb);
serializeHelper(node.right, sb);
}
// Deserialize: mirror of serialize. Consume one token at a time
// from a shared queue. Each recursive call reads exactly one
// subtree's encoding, leaving the queue ready for the next call.
public static TreeNode deserialize(String data) {
Queue<String> tokens = new LinkedList<>(Arrays.asList(data.split(",")));
return deserializeHelper(tokens);
}
private static TreeNode deserializeHelper(Queue<String> tokens) {
String token = tokens.poll();
if (token.equals("#")) {
return null;
}
TreeNode node = new TreeNode(Integer.parseInt(token));
// Left subtree consumes tokens before right subtree —
// exact mirror of the order serialize wrote them in.
node.left = deserializeHelper(tokens);
node.right = deserializeHelper(tokens);
return node;
}
}9. Test the Code
// Standard tree
TreeNode root = new TreeNode(1);
root.left = new TreeNode(2);
root.right = new TreeNode(3);
root.right.left = new TreeNode(4);
root.right.right = new TreeNode(5);
String data = serialize(root);
TreeNode restored = deserialize(data);
System.out.println(serialize(restored).equals(data)); // true — round-trip identity
// Empty tree
String empty = serialize(null);
TreeNode emptyRestored = deserialize(empty);
System.out.println(emptyRestored == null); // true
// Single node
TreeNode single = new TreeNode(42);
String singleData = serialize(single);
System.out.println(serialize(deserialize(singleData)).equals(singleData)); // true
// Skewed left
TreeNode skewed = new TreeNode(1);
skewed.left = new TreeNode(2);
skewed.left.left = new TreeNode(3);
String skewedData = serialize(skewed);
System.out.println(serialize(deserialize(skewedData)).equals(skewedData)); // true
// Negative values
TreeNode negative = new TreeNode(-1);
negative.left = new TreeNode(-2);
negative.right = new TreeNode(-3);
String negData = serialize(negative);
System.out.println(serialize(deserialize(negData)).equals(negData)); // true10. Key Lessons
When designing an encoding for hierarchical data, structure has to be encoded explicitly. Values alone aren't enough — you need null markers, length prefixes, or some other mechanism to disambiguate shape. This applies to trees, JSON, protocol buffers, and any other serializable format.
The serializer and deserializer should mirror each other. If your serializer is preorder, your deserializer reads in preorder. If your serializer is BFS, your deserializer reads level by level. When the two halves have parallel structure, correctness is much easier to argue.
Preorder is the natural traversal for encode-decode pairs on trees because the parent's value is written before its children, so the decoder always knows what to do with the next token before it sees the children. Inorder and postorder don't have this property and require more bookkeeping.
Design problems reward symmetry. When the two halves of an algorithm are inverses, the code structures should also be inverses. If your encoder is 10 lines and your decoder is 50, that's a sign one of them is doing too much work — or that the format itself isn't well-designed.
The thing that makes Serialize and Deserialize Binary Tree click is the realization that structure is data, and any encoding that doesn't explicitly record structure will lose information.
Good Luck and Happy Coding!
Comments