'How to convert N-D numpy arrays element-wise to list

I have two numpy arrays:

A = [[1,2,3],
     [4,5,6]]


B = [[-1,-2,-3],
     [-4,-5,-6]]

I would like to combine the two into a normal python list, such that (i,j) element from each array is put in the same list:

C= [[1,-1],[2,-2],[3,-3],[4,-4],[5,-5],[6,-6]]

Is there a way to do this better than the naive O(n^2)?



Solution 1:[1]

First, you show two lists, but call them arrays.

In [102]: A = [[1,2,3],
     ...:      [4,5,6]]
     ...: B = [[-1,-2,-3],
     ...:      [-4,-5,-6]]

A version of the pure array approach:

In [103]: [[i,j] for a,b in zip(A,B) for i,j in zip(a,b)]
Out[103]: [[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]

And a numpy approach - this works with list or arrays since np.stack turns lists into arrays as needed.

In [104]: np.stack((A,B),axis=2).reshape(-1,2).tolist()
Out[104]: [[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]

np.array((A,B)) can also be used but needs some transposing; but that's pretty cheap.

O(n) analysis isn't that useful when evaluating alternatives like this. The big divide is between iterating in python, as the double list comprehension does, or using numpy array methods (which iterate in compiled code). But there are nuances to that as well. Iteration on arrays is slower. Creating arrays from lists takes time. And there's a matter of scaling. Array approaches often have a high setup cost, but scale better.

Lets compare the times:

In [105]: timeit [[i,j] for a,b in zip(A,B) for i,j in zip(a,b)]
1.69 µs ± 29.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [106]: timeit np.stack((A,B),axis=2).reshape(-1,2).tolist()
22.2 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [107]: %%timeit A1=np.array(A); B1=np.array(B)
     ...: np.stack((A1,B1),axis=2).reshape(-1,2).tolist()
14.5 µs ± 65.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For this small example, the pure list approach is fastest. Starting with arrays instead of lists improves the array approach, but it is still slower.

Try something larger:

In [109]: AA = np.ones((200,300)).tolist()
In [110]: timeit [[i,j] for a,b in zip(AA,AA) for i,j in zip(a,b)]
10.4 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [111]: %%timeit A1=np.array(AA); B1=np.array(AA)
     ...: np.stack((A1,B1),axis=2).reshape(-1,2).tolist()
13.4 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

still not enough to give the arrays the advantage.

But what if we drop the requirement that the result be a list?

In [119]: %%timeit A1=np.array(AA); B1=np.array(AA)
     ...: np.stack((A1,B1),axis=2).reshape(-1,2)
144 µs ± 4.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

tolist() is compiled and relatively fast, but still for large enough arrays it does take time.

To sum this - if you start with lists, and return a list, the pure list approach remains best. But starting and ending with arrays can be faster. Mixing list and arrays slows down both.

With the mapping in the accepted answer:

In [120]: %%timeit A1=np.array(AA); B1=np.array(AA)
     ...: C = list(map(list, zip(A1.ravel(), B1.ravel())))
16.2 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Solution 2:[2]

You can cast them to np.array, reshape and transpose:

out = np.array([A,B]).reshape(2,-1).T.tolist()

Or iterate over the sublists and use zip:

out = [list(tpl) for i in range(len(A)) for tpl in zip(A[i], B[i])]

Output:

[[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]

Solution 3:[3]

Alternative, NumPy only, solution to the accepted answer:

A = np.array([[1,2,3],
              [4,5,6]])

B = np.array([[-1,-2,-3],
              [-4,-5,-6]])

C = np.column_stack((A.ravel(),B.ravel())).tolist()

Output:

[[1, -1], [2, -2], [3, -3], [4, -4], [5, -5], [6, -6]]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 vbarbosavaz