'shape vs len for numpy array

Is there a difference (in performance for example) when comparing shape and len? Consider the following example:

In [1]: import numpy as np

In [2]: a = np.array([1,2,3,4])

In [3]: a.shape
Out[3]: (4,)

In [4]: len(a)
Out[4]: 4

Quick runtime comparison suggests that there's no difference:

In [17]: a = np.random.randint(0,10000, size=1000000)

In [18]: %time a.shape
CPU times: user 6 µs, sys: 2 µs, total: 8 µs
Wall time: 13.1 µs
Out[18]: (1000000,)

In [19]: %time len(a)
CPU times: user 5 µs, sys: 1 µs, total: 6 µs
Wall time: 9.06 µs
Out[19]: 1000000

So, what is the difference and which one is more pythonic? (I guess using shape).

python numpy

Solution 1:^[1]

From the source code, it looks like shape basically uses len(): https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py

@property
def shape(self) -> Tuple[int, int]:
    return len(self.index), len(self.columns)

def __len__(self) -> int:
    return len(self.index)

Calling shape will attempt to run both dim calcs. So maybe df.shape[0] + df.shape[1] is slower than len(df.index) + len(df.columns). Still, performance-wise, the difference should be negligible except for a giant giant 2D dataframe.

So in line with the previous answers, df.shape is good if you need both dimensions, for a single dimension, len() seems more appropriate conceptually.

Looking at property vs method answers, it all points to usability and readability of code. So again, in your case, I would say if you want information about the whole dataframe just to check or for example to pass the shape tuple to a function, use shape. For a single column, including index (i.e. the rows of a df), use len().

Solution 2:^[2]

There is really (very small) a different. If you work on time-series data and know that the data is vector (1D), use len as it is faster, and make it habit, even if it is just very-very marginal. Bish's answer already explained what happens behind the scene.

Proper benchmark using %%timeit (I test is several times) resulting in len as the victor:

# tested on pandas DataFrame

%%timeit
len(yhat.values)
# 576 ns ± 1.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
yhat.values.shape[0]
# 607 ns ± 1.07 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Furthermore, in 1D, len as length is more informative (when you read a code) than .shape[0].

Solution 3:^[3]

For 1D case, both len and shape will produce same result. For other case, I shape will provide more information. It depends on program to program in which will provide you better performance. I suggest you to not to worry much about performance.

Solution 4:^[4]

import numpy as np

x = np.linspace(1, 10, 10).reshape((5, 2))
print(x)
print(x.size)
print(len(x))

gives the following output:

[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]
 [ 9. 10.]]
10
5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Bish
Solution 2	Muhammad Yasirroni
Solution 3	Ashiq Imran
Solution 4

'shape vs len for numpy array

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]