'numpy float64, float32, and float16 scalars all use the same number of bytes
I am using numpy floats with different precisions, but no matter what precision I am using, 64, 32, or 16, the object is using the same number of bytes (48 for a single float number)! Here's the code:
import numpy as np
from pympler.asizeof import asizeof
w = np.float32(2)
print(f"{asizeof(w)=}")
w = np.float64(2)
print(f"{asizeof(w)=}")
w = np.float16(2)
print(f"{asizeof(w)=}")
Any idea about why it is like this?
Update:
I am using pympler to check the exact size of the object (w) here, but I have also tested this with a huge number of single numpy float numbers each stored as a dictionary value (with different keys), and I can see by eyeballing the RAM usage by the process that no matter what precision I use, the RAM usage does not change.
Solution 1:[1]
I don't know what pympler does or how accurate it is in measuring size of numpy objects. sys.getsizeof is a more common tool. I suspect pympler tries to get around known limitations of getsizeof for lists and dicts, but for numpy it's fairly good.
In [44]: import sys
In [45]: sys.getsizeof(np.float32(2))
Out[45]: 28
In [46]: sys.getsizeof(np.float64(2))
Out[46]: 32
In [47]: sys.getsizeof(np.float16(2))
Out[47]: 26
Those numbers indicate that these objects have a 24 byte 'overhead', with 2 bytes for the 16's data, two more for 32, and 8 for the 64.
Normally we don't work create np.float64 objects directly. Rather we make an array with a particular dtype, and get something like a np.float64 object when we index a particular element. Keep in mind that numpy does not store values by reference (unless it's object dtype, which is more list like).
Look instead at an array:
In [48]: x = np.arange(24)
In [49]: x.dtype
Out[49]: dtype('int64')
In [50]: x.nbytes
Out[50]: 192 # 24 * 8
In [51]: sys.getsizeof(x)
Out[51]: 304
In [52]: 304-192
Out[52]: 112 # array 'overhead'
In [53]: y = np.array([0])
In [54]: y.nbytes
Out[54]: 8
In [55]: sys.getsizeof(y)
Out[55]: 120 # same 112 byte overhead
So an array has a size of 112 bytes plus its nbytes, which we get from dtype and shape. That's assuming the array 'owns' its data, that is, it isn't a view of some other array.
In [57]: type(x)
Out[57]: numpy.ndarray
In [58]: type(x[0])
Out[58]: numpy.int64
An "extracted" element of x is of type int64. The data for x is
stored as 24*8 bytes, not as 24 32bytes objects.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | hpaulj |
