Binary Insert : How to keep an array sorted as you insert data in it ?

With javascript and recursion

Recently I played solving the overlapping rectangles problem. This problem is simple, it is about finding out among a set of rectangles laid out on a surface which are the ones that overlap, with code of course. I will certainly detail my solution later but since it is very big and requires a bit of work and graphics, I prefer starting by sharing small bricks of the solution now.

A brick of the solution involves scanning the x coordinates of each rectangles like you’re moving from left to right reading every x coordinate of a rectangle you encounter, both left and right side. In computing terms, this means you need a structure to put the x coordinates in, and the scanning means the structure needs to be sorted : first x coordinate is the first coordinate you encounter and therefore the smallest one, last x is the largest.

I picked an array as the data structure so I needed this array to be sorted at all times to track the x coordinates. In real life conditions, the rectangles are thrown at you in chaos order and you have to add the x coordinates to your array and keep it sorted. How would you do that? This is the very problem I intend to expose and solve in this article :

How to keep an array sorted as you insert data in it?

My code is in javascript but the reasoning I will describe is totally language agnostic, you can always apply it to your own language.

***

I know there is a pretty straightforward solution that makes you insert the data at the beginning or end of the already sorted array, and resort the array using a fast sorting algorithm. But I am so lazy, I don’t like the idea that the array is already sorted and just because I added a new value I have to sort it all over again. Even if it can be fast, best sorting would be O(Nlog(N)), the design disturbs me, at night I know the array is sorted and I resort it all over again, it makes me feel really guilty in regards to the CPU.

The next best solution is to consider a linear search to search for the correct position to insert the new value in. You would compare in sequence the new value to the current value and see if you should insert it before or after. This would cost O(N), but it also disturbs me because if I have 15000 values, and the value I want to insert should be inserted at index 14775, I will have to check 14775 values before I know it should be placed there. It makes me feel even more lazy.

So how can I do? Drugs and pills won’t work. Hmmm.

Actually here is a good remedy. I concocted some home compiled algorithm that I called Binary Insert. It fixes my worries. This script works like Binary Search, except it inserts data in a sorted array, pretty straightforward too, right?

Here’s the reasoning :

Binary Insert algorithm

Indeed it is a tweaked version of Binary Search. Instead of finding a value, you search for the correct position to insert the new value by dividing the array in two halves every time. So instead of checking if the value to insert is equal to the upper or lower bound, you check if it is greater or lower, and you keep looking from there.

Here is an example :

Binary insert example

Here are some facts about this method :

- It does not insert duplicates
- It is recursive
- It does not return anything
- It performs the operations in place
- It is in best case O(1) and in worst case O(N). Even if the Binary Insert in itself is only O(log(N)), the O(N) comes from the insert which will always be O(N) in the worst case if you insert a value at the beginning – you would have to push the rest of the values (N) to the end. Here is to visualize it :

Binary Insert analysis

So you see it is better than just sorting the array all over again ! Remember that solution was O(Nlog(N)) so with this it will only be O(N) -> log(N) better. As for linear search, the cost was O(N) for search and O(N) for insert, so in total O(2N) -> O(N) is still better. It is already pretty healing for me. In fact, it makes me happy.

Finally, here is the code :

Here are some notes about the code :

- My implementation is in javascript which offers goodies like the splice function to perform the insert. However don’t forget the insert has still a cost. It you use a different language, just code a separate function to perform the insert.
- This coding style is particular to recursive algorithms. The order of your stopping conditions is very important. I am not used to this style where there is a bunch of return but so far I find it perfectly fine, it’s actually much clearer than having nested if/else statements holding almost hidden returns. If you’re more comfortable with an algorithm that returns something you can always return some exit code in each condition.
- Recursive reasoning can twist your mind, but I really prefer it to a while loop. It displays some kind of…elegance.

Posted in Algorithms, Code

Tagged ,

  • SK

    How about the following:

    function binaryInsert(x, array) {
    var l = 0,
    r = array.length – 1,
    m;
    while (l x) {
    r = m – 1;
    continue;
    }
    l = m + 1;
    if (array[m] === x) {
    break; // replace with return if no duplicates are desired
    }
    }
    array.splice(l, 0, x);
    }

  • IND

    if i understood correctly you algorithm take log(n) to search the index to insert and N to insert at that position.
    So for every insertion will cost O(n) any ways.
    IF you array is already sorted you can insert new element into sorted array in O(n)…by moving comparing last nth element with the new value and moving it to n+1th position.

    • http://machinesaredigging.com/ Elodie

      This algorithm requires the array to be already sorted. It’s indeed O(N) in the worst case. For your last sentence, not sure I understand what you mean, but as you search for the insertion position, you cut your array in half at each iteration. So new position is not necessarily n + 1, it can also be the start position.