Alien Dictionary: A Step-by-Step Interview Walkthrough

17 hours ago
13 min read

Alien Dictionary is one of the hardest "medium" problems in the interview rotation, and it earns that reputation by hiding a clean graph problem behind a deceptive premise. You're handed a list of words "sorted" in some alien language and asked to recover the alphabet. The instinct is to think about sorting or string comparison — but those are dead ends. The real insight is that the sorted order of the words is a set of ordering constraints between letters, and recovering the alphabet means assembling those constraints into a consistent sequence: a topological sort. What makes this problem genuinely tricky is that the constraints are partial (the words don't tell you the relationship between every pair of letters), and the input can be invalid (the constraints might contradict each other, or the words might be impossibly ordered). The signal interviewers want is whether you can extract constraints from indirect data, model them as a graph, and handle the validity edge cases that trip up most candidates.

Inferring a total order from partial constraints is a real, recurring task. Build systems infer compilation order from declared dependencies. Package managers resolve install order from version requirements. Spreadsheet engines order cell recalculation from formula references. Schedulers sequence tasks from precedence rules. Data-migration tools order table creation from foreign-key relationships. Any time you must reconstruct a consistent global ordering from scattered local "this before that" rules, you're solving exactly the problem Alien Dictionary models.

Problem Statement

You are given a list of words sorted lexicographically according to the rules of an unknown alien language. Each word consists of lowercase English letters.

Return a string representing the characters in the alien language in lexicographical order. If the order is invalid, return an empty string. If multiple valid orders exist, return any one of them.

Example:

words = ["wrt","wrf","er","ett","rftt"]

result → "wertf" (one valid alphabet ordering consistent with the word list).

1. Clarify Requirements Before Jumping Into Code

Before writing a single line, we should restate the problem in our own words and ask the kinds of questions that would change our approach.

Input: an array of words, already sorted according to the alien language's rules.

Output: a string of all unique characters in a valid alphabetical order, or an empty string if no valid order exists.

Details to clarify:

Where does the ordering information come from? From the fact that the words are sorted — specifically, from comparing adjacent words.
Are all characters guaranteed to be related to each other? No — the constraints are partial. Some characters' relative order may be undetermined.
If multiple valid orderings exist, which do we return? Any one of them.
What makes an input invalid? Two things: contradictory constraints (a cycle), and an impossible word ordering (a longer word appearing before its own prefix).
Are the characters lowercase English letters? Yes — at most 26 distinct characters.

The thing to flag is that we are not sorting the words — they're already sorted. We're doing the reverse: reverse-engineering the alphabet that would produce this sorted order. The words are evidence; the alphabet is what we infer from that evidence. Reframing it this way ("the input is a set of clues, not a thing to sort") is the first step toward seeing the graph underneath.

2. Identify the Category of the Question

A few signals jump out:

We need to produce a total ordering of items (characters).
The ordering must respect a set of pairwise "this comes before that" constraints.
The constraints are partial and could be contradictory.

That combination — produce a linear ordering that respects pairwise precedence constraints — is the fingerprint of topological sorting on a directed graph. Characters are nodes, "X comes before Y" is a directed edge X → Y, and a valid alphabet is a topological ordering of that graph. If the constraints contain a cycle, no valid ordering exists. Same family as Course Schedule II and any dependency-ordering problem. The wrinkle here is that we have to build the graph ourselves by extracting constraints from the word list — the edges aren't handed to us.

3. Brute Force Solution

Let's think about the naive approach to understand the problem. We could try every possible ordering of the characters and check which one is consistent with the sorted word list.

for each permutation of the unique characters:
    if the word list would be sorted under this ordering:
        return this permutation
return ""

With up to 26 characters, that's up to 26! permutations — astronomically infeasible. And checking each permutation against the full word list adds more cost on top.

The brute force does clarify what we're looking for, though. A valid ordering is one where, for every adjacent pair of words in the list, the first point of difference respects the ordering. That observation — that adjacent words are where the ordering information lives — is the seed of the efficient approach. We don't need to test orderings; we can extract the constraints directly and assemble them.

4. Brainstorm More Solutions

Step 1: Where does the ordering information actually come from?

Let's reason about what the sorted word list actually tells us. The words are sorted by the alien alphabet — but the alphabet itself is hidden. So the question is: what can we deduce about the alphabet purely from the fact that these words are in order?

Think about how dictionary ordering works in a language we do know. To decide which of two words comes first, you scan them left to right until you hit the first position where they differ. Whichever word has the smaller letter at that position comes first — and the letters before that point, being identical, tell you nothing. The first differing letter is the whole story.

That suggests where the information is hiding. Take two adjacent words in the list — say wrt and wrf. Scan together: w == w, r == r, then t vs f — they differ. We know wrt comes before wrf, and dictionary ordering says the word with the smaller first-differing letter goes first. So in this alien alphabet, t must come before f. We've extracted one concrete fact: t precedes f. Everything after that first difference is irrelevant — the moment we know t < f, this pair has told us everything it can.

So each adjacent pair of words yields exactly one ordering fact: find the first position where they differ, and the two letters there give us a "this before that" rule. Repeat across every adjacent pair, and we've harvested every ordering fact the input contains. We now have a pile of pairwise constraints like t before f, w before e, and so on.

Step 2: What about invalid input?

Before we do anything with those constraints, let's stress-test the extraction process, because it can quietly run into an impossible situation.

Consider two adjacent words abc and ab. Scan together: a == a, b == b, and then the second word simply ends — there's no third character to compare. We never find a differing letter. What do we make of that?

Reason it out from how ordering works. A prefix always comes before the word that extends it — ab comes before abc in any sensible dictionary, because running out of letters means you sort earlier. So abc appearing before ab can't happen under any alphabet. The input contradicts itself; there's no valid answer, and we should bail out with an empty string.

This gives us an explicit check we can't skip: if two adjacent words match all the way through the shorter one, but the first word is longer, the input is invalid. It's worth flagging because the extraction loop won't catch it on its own — it just finds no differing character and moves on, happily producing no edge. Miss this check and you'll return a confident, wrong, non-empty answer on contradictory input.

What about if the second word is longer — like ab followed by abc? That's perfectly valid: the shorter word is a prefix of the longer one, and a prefix correctly comes first. We simply find no differing character and move on without recording any edge, because this pair, while consistent, doesn't reveal an ordering between any two letters. Only the reverse situation — the longer word appearing first — signals a contradiction.

Step 3: What do we do with a pile of "before" constraints?

Now step back and look at what we've gathered: a collection of facts of the form "this letter comes before that letter." Our goal is a single sequence of all the letters that respects every one of those facts. How do we assemble local "X before Y" rules into one consistent global order?

It helps to picture the constraints visually. Draw each letter as a point, and for every rule "X before Y," draw an arrow from X to Y. What we've built is a network of letters connected by directed "comes-before" arrows — and what we want is a way to lay all the letters in a line so that every arrow points forward.

A letter with no arrows coming into it has nothing required before it, so it's safe to place first. Once we've placed it, any arrows out of it are satisfied and can be set aside — which may free up the next letters to place.

That gives a natural procedure. Find the letters with nothing required before them, place one, "satisfy" its outgoing arrows by removing them, and see which letters that frees up next. Keep going until everything is placed. To track "how many letters must come before me," we just count, for each letter, how many arrows point into it — and a letter becomes placeable the moment that count hits zero.

And here's a built-in bonus: what if the constraints are contradictory in a circular way — a before b, b before c, c before a? Then none of those three ever has its incoming count reach zero; each is forever waiting on another. They simply never get placed. So if, at the end, we've placed fewer letters than we started with, we know a circular contradiction exists and there's no valid alphabet — return an empty string. The same procedure that builds the order also detects when no order is possible.

If the procedure feels familiar, it should — it's exactly the topological sort we used for Course Schedule II, and the "count incoming arrows, place the zeros, free up the rest" mechanism is Kahn's algorithm. The same tool that ordered courses by their prerequisites orders letters by their precedence rules.

Step 4: Don't forget the letters with no constraints

One setup detail is easy to overlook. Some letters might appear in the words but never show up as the first difference between any adjacent pair — so they pick up no arrows at all. They have no ordering constraints, but they're still part of the alphabet and must appear in the output.

So before extracting any arrows, we should do a pass over all the words and register every letter we see as a point in our network, with an incoming-count of zero. If instead we only created letters when we found an arrow, we'd silently drop any unconstrained letter and return an alphabet that's missing characters. Register every letter first, then add the arrows — that ordering guarantees completeness.

Step 5: Walk through the example

Let's trace words = ["wrt","wrf","er","ett","rftt"].

First, register all characters as nodes: w, r, t, f, e. All start at indegree 0.

Now extract edges from adjacent pairs:

wrt vs wrf: first difference at position 2, t vs f → edge t → f. indegree[f] = 1.
wrf vs er: first difference at position 0, w vs e → edge w → e. indegree[e] = 1.
er vs ett: first difference at position 1, r vs t → edge r → t. indegree[t] = 1.
ett vs rftt: first difference at position 0, e vs r → edge e → r. indegree[r] = 1.

Indegrees now: w=0, e=1, r=1, t=1, f=1.

Kahn's algorithm:

Queue starts with indegree-0 nodes: [w]. Result: "".
Pop w. Result: "w". Decrement e → 0, enqueue e. Queue: [e].
Pop e. Result: "we". Decrement r → 0, enqueue r. Queue: [r].
Pop r. Result: "wer". Decrement t → 0, enqueue t. Queue: [t].
Pop t. Result: "wert". Decrement f → 0, enqueue f. Queue: [f].
Pop f. Result: "wertf". No neighbors. Queue empty.

Result length 5 = number of unique characters. Valid. Answer: "wertf". Correct.

Step 6: Complexity

Let C be the total number of characters across all words, and U the number of unique characters (U ≤ 26).

Registering nodes scans every character once: O(C). Extracting edges compares adjacent word pairs; each comparison scans at most the length of the shorter word, so across all pairs this is O(C) total. Building the graph is therefore O(C).

The topological sort visits each node once and each edge once. The number of nodes is U, and the number of edges is at most one per adjacent word pair, so at most O(number of words). The sort is O(U + E). Since U ≤ 26, the edge count and node count are both small and bounded.

Total time: O(C) — dominated by reading all the input characters.

Space is O(U + E) for the graph, which is O(1) in practice given the 26-letter cap, plus O(C) for the input. The transferable note: the cost is dominated by reading the input to extract constraints; the topological sort itself is cheap because the alphabet is small.

5. Discuss Trade-Offs Between Solutions

Approach	Time	Space	When I'd use it
Brute-force permutations	O(U! × C)	High	Never — factorial in the alphabet size
DFS-based topological sort	O(C + E)	O(U + E)	Works; compact, but cycle detection needs the three-state machine
BFS topological sort (Kahn's)	O(C + E)	O(U + E)	My default — builds the order and detects cycles via result length

Like in Course Schedule II, both DFS and BFS-based topological-sort approaches are optimal. I prefer Kahn's here for the same reason as Course Schedule II: the cycle check falls out of comparing the result length to the node count, with no separate state machine to manage.

6. Pseudocode

register every character in every word as a node, indegree 0

for each adjacent pair (w1, w2) of words:
    if w1 is longer than w2 and w2 is a prefix of w1:
        return ""                       # impossible ordering
    find first position where w1 and w2 differ:
        add edge w1[pos] → w2[pos] (if not already present)
        increment indegree of w2[pos]
        stop after the first difference

queue = all nodes with indegree 0
result = ""

while queue not empty:
    c = queue.pop()
    result += c
    for each neighbor of c:
        indegree[neighbor]--
        if indegree[neighbor] == 0:
            queue.add(neighbor)

if result.length != number of unique characters:
    return ""                            # a cycle blocked some characters
return result

7. Edge Cases

Things to verify before claiming we're done:

Single word input → no adjacent pairs, so no edges; every character is indegree-0 and gets output in some order. Valid. ✓
Disconnected character groups → characters with no constraints are still nodes; they get appended whenever their (zero) indegree lets them. ✓
Prefix-violation invalid case (["abc","ab"]) → caught by the explicit longer-word-before-its-prefix check; return empty string. ✓
Cycle in constraints → cyclic characters never reach indegree 0; result is shorter than the node count; return empty string. ✓
Multiple valid orderings → the queue's pop order determines which valid order we produce; any is acceptable.
Duplicate adjacent words (["abc","abc"]) → no differing character, but the first word isn't longer, so it's valid and produces no edge.

The prefix-violation check is the edge case most candidates miss. It's invalid input that the constraint-extraction loop won't catch on its own — the loop simply finds no differing character and moves on, producing a wrong non-empty answer unless you check explicitly.

8. Write Full Code

import java.util.*;

public class AlienDictionary {

    public static String alienOrder(String[] words) {
        Map<Character, Set<Character>> graph = new HashMap<>();
        Map<Character, Integer> indegree = new HashMap<>();

        // Phase 1: register EVERY character as a node, even those 
        // with no constraints — they're still part of the alphabet 
        // and must appear.
        for (String word : words) {
            for (char c : word.toCharArray()) {
                graph.putIfAbsent(c, new HashSet<>());
                indegree.putIfAbsent(c, 0);
            }
        }

        // Phase 2: extract one ordering edge from each adjacent 
        // word pair
        for (int i = 0; i < words.length - 1; i++) {
            String w1 = words[i];
            String w2 = words[i + 1];

            // Invalid: a longer word can't precede its own prefix
            if (w1.length() > w2.length() && w1.startsWith(w2)) {
                return "";
            }

            int len = Math.min(w1.length(), w2.length());
            for (int j = 0; j < len; j++) {
                char c1 = w1.charAt(j);
                char c2 = w2.charAt(j);

                if (c1 != c2) {
                    // First difference gives one edge: c1 → c2
                    if (!graph.get(c1).contains(c2)) {
                        graph.get(c1).add(c2);
                        indegree.put(c2, indegree.get(c2) + 1);
                    }
                    break;  // only the first difference matters
                }
            }
        }

        // Kahn's algorithm: BFS topological sort
        Queue<Character> queue = new LinkedList<>();
        for (char c : indegree.keySet()) {
            if (indegree.get(c) == 0) {
                queue.add(c);
            }
        }

        StringBuilder result = new StringBuilder();
        while (!queue.isEmpty()) {
            char c = queue.poll();
            result.append(c);
            for (char next : graph.get(c)) {
                indegree.put(next, indegree.get(next) - 1);
                if (indegree.get(next) == 0) {
                    queue.add(next);
                }
            }
        }

        // If not every character made it into the result, 
        // a cycle blocked some
        return result.length() == indegree.size() ? result.toString() : "";
    }
}

9. Test the Code

// Standard ordering
System.out.println(alienOrder(new String[]{"wrt","wrf","er","ett","rftt"}));
// "wertf"

// Simple two-letter constraint
System.out.println(alienOrder(new String[]{"z","x"}));
// "zx"

// Invalid: prefix appears after the longer word
System.out.println(alienOrder(new String[]{"abc","ab"}));
// ""

// Cycle: z→x implied, then x→z implied — contradiction
System.out.println(alienOrder(new String[]{"z","x","z"}));
// ""

// Single word — all characters valid, no constraints
System.out.println(alienOrder(new String[]{"abc"}));
// some permutation containing a, b, c (e.g. "abc")

// Duplicate adjacent words — valid, no edge produced
System.out.println(alienOrder(new String[]{"abc","abc"}));
// some permutation of a, b, c

These hit the meaningful cases: a standard multi-constraint ordering, a minimal two-letter case, the prefix-violation invalid case, a cycle (which must return empty), a single word with no constraints, and duplicate adjacent words. The cycle and prefix-violation cases are the ones that separate correct solutions from those that pass only the happy path.

10. Key Lessons

When ordering information is embedded in sorted data, the signal lives in adjacent comparisons. Two adjacent sorted items reveal exactly one ordering fact: their first point of difference. Extracting that fact across all adjacent pairs gathers every constraint the input contains.
"Reconstruct a global order from pairwise constraints" is a topological sort. Model the items as nodes and each "X before Y" as a directed edge, then sort. If the constraints contradict (a cycle), no valid order exists.
Validity often has more than one failure mode. Here, two things make the input invalid: a cycle in the constraints, and an impossible word ordering (a prefix after its extension). The cycle is caught by the result-length check; the prefix violation needs an explicit check the main loop won't catch on its own.
Initialize all nodes before adding edges. Characters that appear but never participate in a comparison still belong in the output. A two-phase setup — register every node, then add edges — guarantees you don't silently drop them.
Kahn's algorithm builds the order and detects cycles in one pass; comparing the result length to the node count is the entire cycle test. Reuse this whenever a problem reduces to topological sorting.

The thing that makes Alien Dictionary click is the recognition that a sorted list of words is really a set of pairwise ordering constraints, hidden one-per-adjacent-pair at the first differing character. Once you train yourself to ask "where is the ordering information actually encoded?" and to extract constraints from indirect evidence, an entire class of "infer the order" problems becomes the same two-step recipe: gather the constraints, then topologically sort them.

Good Luck and Happy Coding!