K Closest Points to Origin

mosesg1123
16 hours ago
4 min read

Finding the nearest neighbors to a reference point is fundamental in recommendation engines, spatial queries, and clustering. The “K Closest Points to Origin” problem is a classic warm-up for Top-K selection using heaps or sorting optimizations.

Problem Statement

Given an array of points points[i] = [xi, yi] and an integer k, return the k points closest to the origin (0, 0). Distance is measured by Euclidean distance, but you may compare squared distances to avoid expensive square roots.

Example:

points = [[1,3],[-2,2],[2,-2]], k = 2
Closest 2: [-2,2] (dist²=8), [2,-2] (dist²=8)
Output can be in any order: [[-2,2],[2,-2]]

1. Clarify Requirements Before Jumping Into Code

I want to confirm:

Return exactly k points.
Order does not matter.
Points are integer pairs; distances can be large but fit in 32-bit squared.
k is between 1 and n.
Need better than O(n log n) if possible, but O(n log n) may pass.

2. Identify the Category of the Question

This is a Top-K selection problem. Common approaches for these kinds of problems include:

Full sort of all points by distance: O(n log n).
Max-heap of size k: maintain k smallest seen so far → O(n log k).
Quickselect (partition-based): average O(n), worst O(n²).

Heaps and Quickselect are common solutions for Top-K.

3. Consider a Brute-Force Approach to Understand the Problem

The simplest solution here would be compute distance from the origin for every point, then sort the distances by increasing values. Then you can simply return the first k elements.

Time: Computing distances would be O(n), then sorting would be O(n log n)
Space: O(n) for sorting

This works, but let's see if we can do better.

4. Brainstorm More Solutions

Let's see if I can build on the brute-force solution.

Option 1: Quickselect

If I have an array of distances, is there a faster way for me to sort than O(n log n) time? I know the Quickselect algorithm has an average of O(n) time, but in the worst-case it has a runtime of O(n²), so probably not a great fit.

Option 2: Min heap

Let's think through the data structures that I know are useful for maintaining an ordering: min and max heaps.

A min heap seems promising: I know inserting elements into a min heap only takes O(log n) time. And if I were to insert all of the distances into the array, then I can just pop off the top k elements for my result.

But there's a hitch - if I'm inserting all distances into the min heap, then that's O(log n) time for n elements - or in other words, O(n log n) time. So I haven't really improved my runtime with this option.

Option 3: Max heap of size k

Can I even do better than O(n log n)? Well, inserting elements to a heap takes O(log n) time when I am inserting all distances into the heap. So, what if I didn't have to sort all distances? Can I find a solution that only requires me to track k elements, so that inserting into a heap only takes O(log k) time? That would give me a better runtime of O(n log k).

If I were using a min heap with only k elements, I can't easily determine whether or not a new distance belongs in the heap, because while the new distance might be bigger than the min element, it might be smaller than some other element in the heap. I'd have to insert, find and remove the largest element in the heap, then rebalance.

But wait, if I need to remove the largest distance in the heap whenever I see a smaller distance, there is a structure that makes that easy - a max heap!

So, if I push the first k distances into a max heap, then for each remaining point, if the distance is smaller than my top distance, I'll pop the top distance and push the new one! Then at the end, the max heap holds the k closest elements, my runtime is O(n log k), and my space complexity is O(k). That's it!

To summarize the solution:

Push the first k distances into a max-heap
For each remaining point, if the distance < heap's top distance, pop the old max distance and push the new smaller distance
At the end, the heap holds the k closest distances to the origin
Time: O(n log k), Space: O(k).

5. Discuss Trade-Offs Between Your Solutions

Approach	Time	Space	Pros	Cons
Full sort	O(n log n)	O(n)	Very simple	Slower when k ≪ n
Max-heap size k	O(n log k)	O(k)	Optimal for k≪n, predictable	Needs heap code
Quickselect	O(n) avg	O(1)	Linear average time	Complex code, worst-case O(n²)

6. Write Pseudocode to Structure Your Thoughts

function kClosest(points, k):
  heap = new max-heap keyed by dist²
  for i in 0 to points.length-1:
    dist = points[i].x² + points[i].y²
    if heap.size < k:
      heap.push((dist, points[i]))
    else if dist < heap.peek().dist:
      heap.pop()
      heap.push((dist, points[i]))
  result = []
  while heap not empty:
    result.add(heap.pop().point)
  return result

7. Consider Edge Cases

k = points.length → return all points.
All points identical or same distance → any k subset is fine.
Very large coordinates → squared distances fit in 64-bit.
k = 1 → single closest.

8. Write Full Code Syntax

import java.util.*;

public class Solution {
    public int[][] kClosest(int[][] points, int k) {
        PriorityQueue<int[]> maxHeap = new PriorityQueue<>(
            (a, b) -> Long.compare(
                (long)b[0]*b[0] + (long)b[1]*b[1],
                (long)a[0]*a[0] + (long)a[1]*a[1]
            )
        );

        for (int[] p : points) {
            maxHeap.offer(p);
            if (maxHeap.size() > k) {
                maxHeap.poll();
            }
        }

        int[][] res = new int[k][2];
        for (int i = k - 1; i >= 0; i--) {
            res[i] = maxHeap.poll();
        }
        return res;
    }
}

9. Test Your Code

public static void main(String[] args) {
    Solution sol = new Solution();

    int[][] pts1 = {{1,3},{-2,2}};
    System.out.println(Arrays.deepToString(sol.kClosest(pts1,1)));
    // expected [[-2,2]]

    int[][] pts2 = {{3,3},{5,-1},{-2,4}};
    System.out.println(Arrays.deepToString(sol.kClosest(pts2,2)));
    // expected [[3,3],[-2,4]] (order may vary)

    int[][] pts3 = {{0,1},{1,0}};
    System.out.println(Arrays.deepToString(sol.kClosest(pts3,2)));
    // expected both points

    int[][] pts4 = {{2,2},{2,2},{2,2}};
    System.out.println(Arrays.deepToString(sol.kClosest(pts4,1)));
    // expected one [2,2]

    int[][] pts5 = {{10000,10000},{-10000,-10000},{1,1}};
    System.out.println(Arrays.deepToString(sol.kClosest(pts5,1)));
    // expected [[1,1]]
}

10. Key Lessons to Remember for Future Questions

Use a max-heap of size k for efficient Top-K minimal elements in O(n log k).
This pattern generalizes to any Top-K or Bottom-K selection task.
For a good starting point when you're generating ideas for how to optimize your solution, take a moment to think through what an optimal runtime might look like!