Top K Frequent Elements

mosesg1123
Apr 28
4 min read

Finding the most frequent items in a dataset is a common real-world task—think trending topics in social media, most visited pages on a website, or top-selling products. In interviews, Top K Frequent Elements tests your ability to count frequencies and then select the top performers efficiently.

Problem Statement

Given an integer array nums and an integer k, return the k most frequent elements in any order. You may assume k is always valid (1 ≤ k ≤ number of distinct elements).

public int[] topKFrequent(int[] nums, int k);

Aim for better than O(n log n) when possible.

1. Clarify Requirements Before Jumping Into Code

Let me confirm a few details:

k is between 1 and the number of unique elements.
If two elements have the same frequency, any relative order in the output is fine.
Input array can be large; we want something close to O(n).
We need to return exactly k elements, not all frequencies.

2. Identify the Category of the Question

This is a frequency counting plus selection problem. We need to count occurrences (hash map) and then find the top k by frequency. Patterns include sorting by frequency, heap (priority queue), bucket sort, or even quickselect on frequency pairs.

3. Consider a Brute-Force Approach to Understand the Problem

The simplest idea is:

Count all frequencies in a map.
Convert map entries to a list.
Sort the list descending by frequency.
Take the first k keys.
Time Complexity: O(n + d log d), where d is number of distinct elements
Space Complexity: O(n + d)
Drawback: Sorting all d entries is more work than needed if k ≪ d.

4. Brainstorm More Solutions

I need to avoid full sort when d is large. What are my options?

Min-Heap of size k: Push each distinct element with its frequency into a min-heap keyed by frequency. Keep heap size ≤ k by popping the smallest. After processing d entries, heap contains top k. This is O(d log k) time, O(d + k) space.
Bucket Sort on Frequencies: Frequencies range from 1 to n. What if I create an array of buckets where bucket[f] holds all keys with frequency f? Then, I can scan the buckets starting from the highest frequency, collecting elements until I've reached k elements. This is O(n + d) time, O(n + d) space.
Quickselect on Entries: Use a selection algorithm on the distinct entries array to partition around the kth most frequent pivot. Average O(d) time, O(d) space for recursion.

5. Discuss Trade-Offs Between Your Solutions

Approach	Time	Space	Pros	Cons
Sort by frequency	O(n + d log d)	O(d)	Very simple	Sort cost when d large
Min-Heap (size k)	O(n + d log k)	O(d + k)	Good when k ≪ d, incremental selection	Log factor and heap overhead
Bucket Sort	O(n + d)	O(n + d)	True linear time, no heap, direct grouping by frequency	Extra array of size n
Quickselect	O(n + d) avg	O(d)	Average linear selection without full sort	Complex pivot logic, worst O(d²)

Bucket sort is fastest and simplest to implement in one pass after counting.

6. Write Pseudocode to Structure Your Thoughts

function topKFrequent(nums, k):
  // 1. Count frequencies
  freqMap = map from int to int
  for num in nums:
    freqMap[num]++

  // 2. Build buckets
  maxFreq = length(nums)
  buckets = array of lists of size maxFreq + 1
  for (num, freq) in freqMap:
    buckets[freq].add(num)

  // 3. Gather top k
  result = empty list
  for f from maxFreq down to 1:
    for num in buckets[f]:
      result.add(num)
      if result.size == k:
        return result as array

7. Consider Edge Cases

All elements identical (e.g. [2,2,2], k=1) → returns [2].
k equals number of distinct → returns all distinct elements.
n small, k=1 → returns the single most frequent.
Negative and positive values → map handles any integers.
nums empty → invalid per constraints (k at least 1), but we’d return an empty result.

8. Write Full Code Syntax

import java.util.*;

public class Solution {
    public int[] topKFrequent(int[] nums, int k) {
        // 1. Frequency count
        Map<Integer,Integer> freqMap = new HashMap<>();
        for (int num : nums) {
            freqMap.put(num, freqMap.getOrDefault(num,0) + 1);
        }

        // 2. Bucket by frequency
        int n = nums.length;
        List<Integer>[] buckets = new List[n + 1];
        for (int i = 0; i <= n; i++) {
            buckets[i] = new ArrayList<>();
        }
        for (Map.Entry<Integer,Integer> entry : freqMap.entrySet()) {
            int num = entry.getKey(), freq = entry.getValue();
            buckets[freq].add(num);
        }

        // 3. Collect top k
        List<Integer> resultList = new ArrayList<>();
        for (int f = n; f >= 1 && resultList.size() < k; f--) {
            for (int num : buckets[f]) {
                resultList.add(num);
                if (resultList.size() == k) {
                    break;
                }
            }
        }

        // Convert to array
        int[] result = new int[k];
        for (int i = 0; i < k; i++) {
            result[i] = resultList.get(i);
        }
        return result;
    }
}

9. Test Your Code

public static void main(String[] args) {
    Solution sol = new Solution();

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{1,1,1,2,2,3}, 2)
    ));  // [1,2]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{1}, 1)
    ));  // [1]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{4,1,-1,2,-1,2,3}, 2)
    ));  // [-1,2] or [2,-1]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{5,5,4,4,3,3,2,2,1,1}, 3)
    ));  // [5,4,3]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{}, 0)
    ));  // [] (if k=0 allowed)
}

10. Key Lessons to Remember for Future Questions

Count then select is often a two-step pattern: map frequencies, then choose top elements.
Bucket sort on frequency is a linear-time trick when frequencies range from 0 to n.
When k ≪ d, a min-heap of size k can be simpler and memory-efficient.
Always iterate buckets or heap from highest frequency to collect top results.
Clear pseudocode that separates counting, bucketing, and selection clarifies complex logic.
Edge cases—single element, all identical, empty—ensure robust solutions under constraints.