Find closest pair of point with Plane Sweep Algorithm in O(n ln n)

Finding the closest pair of Point in a given collection of points is a standard problem in computational geometry. In this article I’ll explain an efficient algorithm using plane sweep, compare it to the naive implementation and discuss its complexity.

This problem is standard, but not really easy to solve in an efficient way. The first implementation we think of is the naive one, comparing each point to each other point.

In my examples, I’ll use the java.awt.Point class to represent a point. This naive implementation is really easy to implement :

public static Point[] naiveClosestPair(Point[] points) {
    double min = Double.MAX_VALUE;
 
    Point[] closestPair = new Point[2];
 
    for (Point p1 : points) {
        for (Point p2 : points) {
            if (p1 != p2) {
                double dist = p1.distance(p2);
 
                if (dist < min) {
                    min = dist;
 
                    closestPair[0] = p1;
                    closestPair[1] = p2;
                }
            }
        }
    }
 
    return closestPair;
}

As you can directly see, this naive implementation has a complexity of O(n^2). But we can do a lot better using a plane sweep algorithm.

With that algorithm, we’ll sweep the plane from left to right (right to left is also a possibility) and when we reach a point we’ll compute all the interesting candidates (the candidates that can be in the closest pair).

For that we’ll make the following operations :

  1. We sort the list of points from left to right in the x axis
  2. And then for each point :
    1. We remove from the candidates all the point that are further in x axis that the current min distance
    2. We take all the candidates that are located more or less current min distance from the current point in y axis
    3. We test for the min distance all the founded candidates with the current point
    4. And finally we add the current point to the list of candidates

So when we found a new min distance, we can make the rectangle of candidates smaller in the x axis and smaller in the y axis. So we make a lot less comparisons between the points.

Here is a picture illustrating that :

Plane Sweep Algorithm

Plane Sweep Algorithm

The red points are the closest pair at this time of the algorithm. The red rectangle is the rectangle of the candidates delimited in right by the current point. And the yellow rectangle contains only the candidates interesting for the current point.

There is always a maximum of 6 points in the yellow rectangle, the 4 vertices, the point with the same coordinates as the current point and finally the point in the same y coordinate and in the limit of the x axis. Even if the maximum is 6, you’ll almost never have more than 2 points in that list (the maximum is see in my test was 3 with a collection of 1’000’000 random points). You can see this 6 points here :

Maximum points to compare

Maximum points to compare

If all that stuff is not really clear for you, you can watch it in action here in a Java applet.

The candidates must also be always sorted. For that, we’ll use a Binary Search Tree for good performances.

If we look at the complexity :

  • Sorting all the points in right axis : Cost O(n ln n) with Quick Sort by example
  • Shrinking the list of candidates take O(n) from start to end of the algorithm because we add n points to the candidates and we can remove only n points. So this is constant for each point : O(1).
  • Searching all the candidates between two values in y axis cost O(ln n) with binary search
  • The comparisons with at maximum 6 points are made in O(1)
  • Add the candidates and keep the list of candidates sorted cost O(ln n).

So the total complexity is O(n ln(n) + n * ( 1 + ln(n) + 1 + ln(n) ) ) = O(n ln n).

So it’s really better than O(n^2) for the naive implementation.

So now, we can go to the implementation in Java.

public static Point[] closestPair(Point[] points) {
    Point[] closestPair = new Point[2];
 
    //When we start the min distance is the infinity
    double crtMinDist = Double.POSITIVE_INFINITY;
 
    //Get the points and sort them
    Point[] sorted = Arrays.copyOf(points, points.length);
    Arrays.sort(sorted, HORIZONTAL_COMPARATOR);
 
    //When we start the left most candidate is the first one
    int leftMostCandidateIndex = 0;
 
    //Vertically sorted set of candidates
    SortedSet<Point> candidates = new TreeSet<Point>(VERTICAL_COMPARATOR);
 
    //For each point from left to right
    for (Point current : sorted) {
        //Shrink the candidates
        while (current.x - sorted[leftMostCandidateIndex].x &gt; crtMinDist) {
            candidates.remove(sorted[leftMostCandidateIndex]);
            leftMostCandidateIndex++;
        }
 
        //Compute the y head and the y tail of the candidates set
        Point head = new Point(current.x, (int) (current.y - crtMinDist));
        Point tail = new Point(current.x, (int) (current.y + crtMinDist));
 
        //We take only the interesting candidates in the y axis
        for (Point point : candidates.subSet(head, tail)) {
            double distance = current.distance(point);
 
            //Simple min computation
            if (distance < crtMinDist) {
                crtMinDist = distance;
 
                closestPair[0] = current;
                closestPair[1] = point;
            }
        }
 
        //The current point is now a candidate
        candidates.add(current);
    }
 
    return closestPair;
}

The code isn’t overcomplicated. We see all the steps explained in the article and that works well. The Horizontal and Vertical comparators are really simple :

private static final Comparator<Point> VERTICAL_COMPARATOR = new Comparator<Point>() {
    @Override
    public int compare(Point a, Point b) {
        if (a.y < b.y) {
            return -1;
        }
        if (a.y > b.y) {
            return 1;
        }
        if (a.x < b.x) {
            return -1;
        }
        if (a.x > b.x) {
            return 1;
        }
        return 0;
    }
};
 
private static final Comparator<Point> HORIZONTAL_COMPARATOR = new Comparator<Point>() {
    @Override
    public int compare(Point a, Point b) {
        if (a.x < b.x) {
            return -1;
        }
        if (a.x > b.x) {
            return 1;
        }
        if (a.y < b.y) {
            return -1;
        }
        if (a.y > b.y) {
            return 1;
        }
        return 0;
    }
};

Here is a performance comparison for some sizes with a Benchmark Framework I described here for some sizes of collection points.

NaiveSweeping
100189.923 us53.685 us
5004.448 ms279.042 us
100017.790 ms556.731 us
5000458.728 ms3.320 ms

Like you can see, this is really better than the naive implementation. So we make a good job. But if we make one more test with 10 elements :

NaiveSweeping
101.932 us4.746 us

Now our good sweeping algorithm is slower than the naive !

When we think about that, we realize that it’s logical. In fact we made a lot of computations before starting the algorithm like sorting the points, creating a list of candidantes, … All that stuff is heavier than make n^2 comparisons in little number. So what can we do to have good performances with small number of points ?

It’s really easy to solve. We just have to found the number before which the naive algorithm is quicker than the sweeping and when the size of the points collection is smaller than this pivot number we use the naive implementation. On my computer I found that before 75 elements, the naive implementation was faster than the sweeping algorithm, so we can refactor our method :

public static Point[] closestPair(Point[] points) {
    if(points.length < 75){
        return naiveClosestPair(points);
    }
 
    //No changes
}

And we’ve good performances for little set of points :)

So our method is now complete. I hope you found that post interesting and that will be useful to someone.

Related posts:

  1. Develop a modular application – The modules
  2. My Java Benchmarks on GitHub
  3. Develop a modular application – Bases
  • Vulongbienbk

    good

  • Vaibhav

    Thanks for the implementation part. I was having trouble with it. Can you also help in the implementation in java using divide and conquer algorithm? Thanks in advance :-)

    • Anonymous

      Hi. 

      What kind of trouble do you have with the implementation ? I think the implementation in divide in conquer should be quite straightforward. Once you sorted the array (or before, doesn’t change a lot), you separate the arrays in n portions (n is the number of threads you want to use) and each thread run the algorithm on its portions of the array. Finally, when each thread have finished their search, you have n solutions and you just have to found the closest pair between these n solutions. 

      I hope this helps

      • Vaibhav

        Hey, firstly I am not using parallel architecture :-P but thanks anyway :-)
        Just one thing, I implemented your code and ran it on random test cases. Its giving me  java.lang.IllegalArgumentException: fromKey > toKey. Here is my code https://ideone.com/RKgk4. Thanks in advance :-)

        • Vaibhav

          Hey never mind, there was a problem with my IO. Thanks :-)

          • Anonymous

            Ok :)

            For the parralel part, I thought that’s was what you wanted because of your question about divide-and-conquer, I made my conclusion a bit quickly :D

  • Hassan

    how come the naive faster in 75 elements ?? 75*ln(75) < 75^2 
    this is contradicting with the complexity calculations 

    • Anonymous

      We cannot compare directly using the big O notation here. Before starting the search itself, we have to do a lot of  computation like sorting the points, creating a copy of the array. That’s not free. For example, the copy need to allocate some memory and that’s a high cost operation. When we have a lot of points theses costs are amortized, but not with a little number.

      If you really want to use the big O notation like you try, you have to put numbers in these costs. Take into account that the big O is not precise. In this example, the real complexity will perhaps be O(n * ln(n) + 5000) (just an example, not a real value) so that this 5000 will change the result for little numbers. 

  • Shay

    I don’t understand why the solution is in complexity O(n*ln n).
    Consider the following input (sorted by x):
    The 1st point is at (x0,0) = p0.
    The 2nd point is at (x1,y) = p1.
    Now denote d = distance(p0,p1).
    Let the 3rd point be (x1 + d/2, y+d) = p2.
    Let the 4th point be (p2.x + d/4, p2.y + 2*d) = p3,
    The 5th point be (p3.x + d/8, p2.y + 3*d) = p4 and so on.

    Using this input, I don’t see why any point should ever leave the candidate list,
    since the shortest distance is d, and the distance between the left most point and the right most point is < d.
    However, the cumulative distance on y is always increasing, so d is always kept as the minimal distance between two points.

    Consider the "n/2"th  point and onward – each point needs to test at least n/2 candidates, thus the total complexity should be O(n^2).

    Where am I wrong?

    • Anonymous

      In the worst case, you’ll have only 6 points to compare to your current point. Perhaps your problem is here ?