'What is big O of this pseudocode?

For this pseudocode, what would be big O of this:

BinarySum(A, i, n) {
    if n=1 then
        return A[i] // base case 
    return BinarySum(A, i, n/2) + BinarySum(A, i+[n/2], [n/2])
}

This is really confusing because I think it is O(logn) since you are dividing by 2 each time you run the function but then some people I spoke to think it is O(n) and not O(logn) because the algorithm doesn't half the problem by picking and choosing one half of the array. What is it?

data-structures big-o

Solution 1:^[1]

TL;DR

The runtime is ?(n).

How to determine runtime

The recurrence relation for the algorithm is

T(n) = 2T(n/2) + O(1)

Because we have two recursive calls for half of the array in every call and need constant time in a single call. BTW: this is the same recurrence relation which also describes a binary tree traversal.

You can use the Master-theorem to determine the runtime here since a >= 1 and b > 1 and the recurrence relation has the form

T(1) = 1; T(n) = aT(n/b) + f(n)

This is case one of the theorem meaning f(n) = O(n ^ log_b(a)). This is because with a = 2, b = 2 and f(n) = O(1) like in this case log₂(2) = 1 and therefore f(n) = O(1) = O(n¹) = O(n).

When case one is applicable the Master-theorem says that the runtime of the algorithm is ?(n).

Solution 2:^[2]

First, let me say that I like the other answer that links the recurrence with the traversal of binary trees. For a balanced binary tree, this is indeed the recurrence, and so the complexity must necessarily be the same as a depth-first traversal, which we all know is O(n). I like this analogy because it clearly says that the result doesn't just apply to the recurrence T(n) = 2T(n/2) + O(1) but to anything where you split the input into chunks of sizes m[0], m[1], ... that sum to size n and do T(n) = T(m[0]) + T(m[1]) + T(m[2]) + ... + O(1). You don't have to split the input into two equally sized parts to get O(n); you just have to spent constant time and then recurse on disjoint parts of the input.

Using the Master's Theorem, I feel, is a bit overkill for this one. It is a big gun, and it gives us the correct answer, but if you are like me, it doesn't give you much intuition about why the answer is what it is. With this particular recurrence, we can get the correct answer and an intuitive understanding of it with a few drawings.

We can break down what happens at each level of the recursion and maybe draw it like this:

We have the work that we are handling on the left, i.e., the size of the input and the actual time we spend at each function call on the right. We have input size n at the first level, but we only spend one "computation unit" of time on it. We have two chunks of size n/2 that we spend two units of time on at the next level. At the next level, we have four chunks, each of size n/4, and we spend four units on them. This continues until our chunks have size one, and we have n of those, that we spend n units of time on.

The total time we spend is the area of the red blocks on the right. The depth of the recursion is O(log n) but we don't need to worry about that to analyse the time. We will just flip the "time" bit and look at it this way:

The total time we spend must be n for the original bottom layer (now top layer), n/2 for the next layer, n/4 for the next, and so on. Move n outside of parentheses, and all we have to worry about is the sum 1+1/2+1/4+1/8+.... The sum ends at some point, of course. We only have O(log n) terms. But we don't have to worry about that at all, because even if it continued forever we wouldn't sum to more than two.

You might recognise this as a geometric series. They have the form sum x^i for i=0 to infinity, and when |x|<1 they converge to 1/(1-x). Proving this takes a little calculus, but when x = 1/2 as we have, it is easy to draw the series and get the result from there.

Take the size n layer and then start putting the remaining layers next to each other under it. First, you put down the n/2 layer. It takes half of the space. Then you put the n/4 layer next to it, and it takes half of the remaining space. The n/8 layer will take half of the remaining space, the n/16 layer will take half of the remaining space, and it will continue like this as if it were a reenactment of Zeno's paradox.

If you keep taking half of what is left forever, you can never get more than you started with, so adding up all the layers except the first cannot give you more time spent than you spent on the very first layer. The total time you would do if you kept recursing forever (and time worked like real numbers) would be linear. Since you stop before forever, it is still going to be linear. Infinity gives us O(n) so recursion depth O(log n) will as well.

Of course, getting O(n) from observing that T(n) < T'(n) = O(n) where T'(n) continues subdividing forever only tells us that T(n) = O(n) and not that T(n) = Omega(n), there you have to show that you don't spend substantially less time than n, but considering that the largest layer is n, it should be obvious that the recursion also runs in Omega(n).

If you don't cut the data size in half every recursion, but cut the data in some other chunks that add up to n, you still get O(n) of course--think of traversing a tree--but it gets a hell of a lot harder to draw, and I've never managed to come up with a good illustration of that. For splitting the data in half, though, the drawing is simple and the conclusions we draw from it gives us the correct running time.

The same drawing also tells us that the recurrence T(n) = T(n/2) + O(n) is in O(n). Here, we don't have to flip the first drawing, because we start out with the largest layer on top. We spend time n then n/2 then n/4 and so on, and we end up spending 2n time units. Because 2 isn't special here, we have T(n) = T(f·n) + O(n) = O(n) for any fraction 0 ? f < 1, it is just a lot harder to draw when f ? 1/2.

Solution 3:^[3]

First a bug:

BinarySum(A, i, n) {
    if n=1 then
        return A[i] // base case 
    return BinarySum(A, i, n/2) + BinarySum(A, i+(n/2), (n - n/2))
    //                                                   ^^^^
}

The second half might be of uneven length. And then the last value was dropped for n/2. Recursively this might be several values.

On the complexity. Having A[0] + A[1] + A[2] + ... + A[n-1]. The recursion goes down to a single A[i] and for every + above adds exactly 1 left and right. So (n-1 subtrees + n leafs = 2n-1) O(n). Furthermore the call tree is irrelevant. BinarySum is not faster (than non-binary sums) unless using multithreading.

Solution 4:^[4]

There's a difference between making just one recursive function call, like binary search does, and two recursive function calls, like your code does. In both cases, the maximum depth of the recursion is O(log n), but the total cost of the algorithms are different. I think you are confusing the maximum depth of the recursion with total running time of the algorithm.

The given function does a constant amount c of work before making two recursive function calls. Let c denote the work done by a function outside its recursive calls. You can draw a recursion tree where each node is the cost of a function call. The root node has a cost of c. The root node has two children because there are two recursive calls, each with a cost of c. Each of these children makes two further recursive calls; hence, the root node has 4 grandchildren, each with a cost of c. This continues until we hit the base case.

The total cost of the recursion tree is the cost of the root node (which is c), plus the cost of its children (which is 2c), plus the cost of the grandchilden (which is 4c), and so on, until we hit the n leaves (which have a total cost of nc, where for simplicity we'll assume n is a power of 2). The total cost of all levels of the recursion tree is c+2c+4c+8c+...+nc = O(nc) = O(n). Here, we used the fact that in an increasing geometric series, the total sum is dominated by the last term (the sum is essentially just the last term, up to constant factors, which are subsumed in asymptotic notation). This sum had O(log n) terms, but the sum is O(n).

Equivalently, the recurrence describing the running time of your algorithm is T(n) = 2T(n/2)+c, and by the Master theorem, the solution is T(n) = O(n). This is different from binary search, which has the recurrence T(n)=1T(n/2)+c, which has the solution T(n)=O(log n). For binary search, the total cost of all levels of the recursion tree would be c+c+...+c; here, the sum has O(log n) terms and the sum is O(log n).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Mushroomator
Solution 2	Thomas Mailund
Solution 3
Solution 4	Ashwin Ganesan

'What is big O of this pseudocode?

Solution 1:[1]

TL;DR

How to determine runtime

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]