K Closest Points to Origin
- mosesg1123
- 16 hours ago
- 4 min read
Finding the nearest neighbors to a reference point is fundamental in recommendation engines, spatial queries, and clustering. The “K Closest Points to Origin” problem is a classic warm-up for Top-K selection using heaps or sorting optimizations.
Problem Statement
Given an array of points points[i] = [xi, yi] and an integer k, return the k points closest to the origin (0, 0). Distance is measured by Euclidean distance, but you may compare squared distances to avoid expensive square roots.
Example:
points = [[1,3],[-2,2],[2,-2]], k = 2
Closest 2: [-2,2] (dist²=8), [2,-2] (dist²=8)
Output can be in any order: [[-2,2],[2,-2]]
1. Clarify Requirements Before Jumping Into Code
I want to confirm:
Return exactly k points.
Order does not matter.
Points are integer pairs; distances can be large but fit in 32-bit squared.
k is between 1 and n.
Need better than O(n log n) if possible, but O(n log n) may pass.
2. Identify the Category of the Question
This is a Top-K selection problem. Common approaches for these kinds of problems include:
Full sort of all points by distance: O(n log n).
Max-heap of size k: maintain k smallest seen so far → O(n log k).
Quickselect (partition-based): average O(n), worst O(n²).
Heaps and Quickselect are common solutions for Top-K.
3. Consider a Brute-Force Approach to Understand the Problem
The simplest solution here would be compute distance from the origin for every point, then sort the distances by increasing values. Then you can simply return the first k elements.
Time: Computing distances would be O(n), then sorting would be O(n log n)
Space: O(n) for sorting
This works, but let's see if we can do better.
4. Brainstorm More Solutions
Let's see if I can build on the brute-force solution.
Option 1: Quickselect
If I have an array of distances, is there a faster way for me to sort than O(n log n) time? I know the Quickselect algorithm has an average of O(n) time, but in the worst-case it has a runtime of O(n²), so probably not a great fit.
Option 2: Min heap
Let's think through the data structures that I know are useful for maintaining an ordering: min and max heaps.
A min heap seems promising: I know inserting elements into a min heap only takes O(log n) time. And if I were to insert all of the distances into the array, then I can just pop off the top k elements for my result.
But there's a hitch - if I'm inserting all distances into the min heap, then that's O(log n) time for n elements - or in other words, O(n log n) time. So I haven't really improved my runtime with this option.
Option 3: Max heap of size k
Can I even do better than O(n log n)? Well, inserting elements to a heap takes O(log n) time when I am inserting all distances into the heap. So, what if I didn't have to sort all distances? Can I find a solution that only requires me to track k elements, so that inserting into a heap only takes O(log k) time? That would give me a better runtime of O(n log k).
If I were using a min heap with only k elements, I can't easily determine whether or not a new distance belongs in the heap, because while the new distance might be bigger than the min element, it might be smaller than some other element in the heap. I'd have to insert, find and remove the largest element in the heap, then rebalance.
But wait, if I need to remove the largest distance in the heap whenever I see a smaller distance, there is a structure that makes that easy - a max heap!
So, if I push the first k distances into a max heap, then for each remaining point, if the distance is smaller than my top distance, I'll pop the top distance and push the new one! Then at the end, the max heap holds the k closest elements, my runtime is O(n log k), and my space complexity is O(k). That's it!
To summarize the solution:
Push the first k distances into a max-heap
For each remaining point, if the distance < heap's top distance, pop the old max distance and push the new smaller distance
At the end, the heap holds the k closest distances to the origin
Time: O(n log k), Space: O(k).
5. Discuss Trade-Offs Between Your Solutions
Approach | Time | Space | Pros | Cons |
Full sort | O(n log n) | O(n) | Very simple | Slower when k ≪ n |
Max-heap size k | O(n log k) | O(k) | Optimal for k≪n, predictable | Needs heap code |
Quickselect | O(n) avg | O(1) | Linear average time | Complex code, worst-case O(n²) |
6. Write Pseudocode to Structure Your Thoughts
function kClosest(points, k):
heap = new max-heap keyed by dist²
for i in 0 to points.length-1:
dist = points[i].x² + points[i].y²
if heap.size < k:
heap.push((dist, points[i]))
else if dist < heap.peek().dist:
heap.pop()
heap.push((dist, points[i]))
result = []
while heap not empty:
result.add(heap.pop().point)
return result
7. Consider Edge Cases
k = points.length → return all points.
All points identical or same distance → any k subset is fine.
Very large coordinates → squared distances fit in 64-bit.
k = 1 → single closest.
8. Write Full Code Syntax
import java.util.*;
public class Solution {
public int[][] kClosest(int[][] points, int k) {
PriorityQueue<int[]> maxHeap = new PriorityQueue<>(
(a, b) -> Long.compare(
(long)b[0]*b[0] + (long)b[1]*b[1],
(long)a[0]*a[0] + (long)a[1]*a[1]
)
);
for (int[] p : points) {
maxHeap.offer(p);
if (maxHeap.size() > k) {
maxHeap.poll();
}
}
int[][] res = new int[k][2];
for (int i = k - 1; i >= 0; i--) {
res[i] = maxHeap.poll();
}
return res;
}
}
9. Test Your Code
public static void main(String[] args) {
Solution sol = new Solution();
int[][] pts1 = {{1,3},{-2,2}};
System.out.println(Arrays.deepToString(sol.kClosest(pts1,1)));
// expected [[-2,2]]
int[][] pts2 = {{3,3},{5,-1},{-2,4}};
System.out.println(Arrays.deepToString(sol.kClosest(pts2,2)));
// expected [[3,3],[-2,4]] (order may vary)
int[][] pts3 = {{0,1},{1,0}};
System.out.println(Arrays.deepToString(sol.kClosest(pts3,2)));
// expected both points
int[][] pts4 = {{2,2},{2,2},{2,2}};
System.out.println(Arrays.deepToString(sol.kClosest(pts4,1)));
// expected one [2,2]
int[][] pts5 = {{10000,10000},{-10000,-10000},{1,1}};
System.out.println(Arrays.deepToString(sol.kClosest(pts5,1)));
// expected [[1,1]]
}
10. Key Lessons to Remember for Future Questions
Use a max-heap of size k for efficient Top-K minimal elements in O(n log k).
This pattern generalizes to any Top-K or Bottom-K selection task.
For a good starting point when you're generating ideas for how to optimize your solution, take a moment to think through what an optimal runtime might look like!
Comments