top of page

Top K Frequent Elements

  • mosesg1123
  • 7 hours ago
  • 4 min read

Finding the most frequent items in a dataset is a common real-world task—think trending topics in social media, most visited pages on a website, or top-selling products. In interviews, Top K Frequent Elements tests your ability to count frequencies and then select the top performers efficiently.


Problem Statement

Given an integer array nums and an integer k, return the k most frequent elements in any order. You may assume k is always valid (1 ≤ k ≤ number of distinct elements).

public int[] topKFrequent(int[] nums, int k);

Aim for better than O(n log n) when possible.


1. Clarify Requirements Before Jumping Into Code

Let me confirm a few details:

  • k is between 1 and the number of unique elements.

  • If two elements have the same frequency, any relative order in the output is fine.

  • Input array can be large; we want something close to O(n).

  • We need to return exactly k elements, not all frequencies.


2. Identify the Category of the Question

This is a frequency counting plus selection problem. We need to count occurrences (hash map) and then find the top k by frequency. Patterns include sorting by frequency, heap (priority queue), bucket sort, or even quickselect on frequency pairs.


3. Consider a Brute-Force Approach to Understand the Problem

The simplest idea is:

  1. Count all frequencies in a map.

  2. Convert map entries to a list.

  3. Sort the list descending by frequency.

  4. Take the first k keys.

  5. Time Complexity: O(n + d log d), where d is number of distinct elements

  6. Space Complexity: O(n + d)

  7. Drawback: Sorting all d entries is more work than needed if k ≪ d.


4. Brainstorm More Solutions

I need to avoid full sort when d is large. What are my options?

  • Min-Heap of size k: Push each distinct element with its frequency into a min-heap keyed by frequency. Keep heap size ≤ k by popping the smallest. After processing d entries, heap contains top k. This is O(d log k) time, O(d + k) space.

  • Bucket Sort on Frequencies: Frequencies range from 1 to n. What if I create an array of buckets where bucket[f] holds all keys with frequency f? Then, I can scan the buckets starting from the highest frequency, collecting elements until I've reached k elements. This is O(n + d) time, O(n + d) space.

  • Quickselect on Entries: Use a selection algorithm on the distinct entries array to partition around the kth most frequent pivot. Average O(d) time, O(d) space for recursion.


5. Discuss Trade-Offs Between Your Solutions

Approach

Time

Space

Pros

Cons

Sort by frequency

O(n + d log d)

O(d)

Very simple

Sort cost when d large

Min-Heap (size k)

O(n + d log k)

O(d + k)

Good when k ≪ d, incremental selection

Log factor and heap overhead

Bucket Sort

O(n + d)

O(n + d)

True linear time, no heap, direct grouping by frequency

Extra array of size n

Quickselect

O(n + d) avg

O(d)

Average linear selection without full sort

Complex pivot logic, worst O(d²)

Bucket sort is fastest and simplest to implement in one pass after counting.


6. Write Pseudocode to Structure Your Thoughts

function topKFrequent(nums, k):
  // 1. Count frequencies
  freqMap = map from int to int
  for num in nums:
    freqMap[num]++

  // 2. Build buckets
  maxFreq = length(nums)
  buckets = array of lists of size maxFreq + 1
  for (num, freq) in freqMap:
    buckets[freq].add(num)

  // 3. Gather top k
  result = empty list
  for f from maxFreq down to 1:
    for num in buckets[f]:
      result.add(num)
      if result.size == k:
        return result as array

7. Consider Edge Cases

  • All elements identical (e.g. [2,2,2], k=1) → returns [2].

  • k equals number of distinct → returns all distinct elements.

  • n small, k=1 → returns the single most frequent.

  • Negative and positive values → map handles any integers.

  • nums empty → invalid per constraints (k at least 1), but we’d return an empty result.


8. Write Full Code Syntax

import java.util.*;

public class Solution {
    public int[] topKFrequent(int[] nums, int k) {
        // 1. Frequency count
        Map<Integer,Integer> freqMap = new HashMap<>();
        for (int num : nums) {
            freqMap.put(num, freqMap.getOrDefault(num,0) + 1);
        }

        // 2. Bucket by frequency
        int n = nums.length;
        List<Integer>[] buckets = new List[n + 1];
        for (int i = 0; i <= n; i++) {
            buckets[i] = new ArrayList<>();
        }
        for (Map.Entry<Integer,Integer> entry : freqMap.entrySet()) {
            int num = entry.getKey(), freq = entry.getValue();
            buckets[freq].add(num);
        }

        // 3. Collect top k
        List<Integer> resultList = new ArrayList<>();
        for (int f = n; f >= 1 && resultList.size() < k; f--) {
            for (int num : buckets[f]) {
                resultList.add(num);
                if (resultList.size() == k) {
                    break;
                }
            }
        }

        // Convert to array
        int[] result = new int[k];
        for (int i = 0; i < k; i++) {
            result[i] = resultList.get(i);
        }
        return result;
    }
}

9. Test Your Code

public static void main(String[] args) {
    Solution sol = new Solution();

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{1,1,1,2,2,3}, 2)
    ));  // [1,2]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{1}, 1)
    ));  // [1]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{4,1,-1,2,-1,2,3}, 2)
    ));  // [-1,2] or [2,-1]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{5,5,4,4,3,3,2,2,1,1}, 3)
    ));  // [5,4,3]

    System.out.println(Arrays.toString(
        sol.topKFrequent(new int[]{}, 0)
    ));  // [] (if k=0 allowed)
}

10. Key Lessons to Remember for Future Questions

  • Count then select is often a two-step pattern: map frequencies, then choose top elements.

  • Bucket sort on frequency is a linear-time trick when frequencies range from 0 to n.

  • When k ≪ d, a min-heap of size k can be simpler and memory-efficient.

  • Always iterate buckets or heap from highest frequency to collect top results.

  • Clear pseudocode that separates counting, bucketing, and selection clarifies complex logic.

  • Edge cases—single element, all identical, empty—ensure robust solutions under constraints.

Recent Posts

See All
Missing Number

When you need to find the one missing entry in a sequence—say missing log IDs, seats, or data packets— Missing Number  is the classic...

 
 
 
Intersection of Two Arrays

Finding the intersection of two arrays - i.e. the common elements between two lists - is a real-world task you’ll do when reconciling...

 
 
 
Kth Largest Element in an Array

Finding the Kth largest element in an array is a practical problem you’ll face when you need to compute statistics on large...

 
 
 

Comentarios


Drop Me a Line, Let Me Know What You Think

Thanks for submitting!

© 2023 by Train of Thoughts. Proudly created with Wix.com

bottom of page