'Split a large numpy array into multiple numpy arrays?
I have a large numpy array with a size of 699720. Here is what the numpy array looks like with a shape of (4998, 140)
[[-0.11252183 -2.8272038 -3.773897 ... 0.12343082 0.92528623
0.19313742]
[-1.1008778 -3.9968398 -4.2858424 ... 0.7738197 1.1196209
-1.4362499 ]
[-0.567088 -2.5934503 -3.8742297 ... 0.32109663 0.9042267
-0.4217966 ]
...
[-1.1229693 -2.252925 -2.867628 ... -2.874136 -2.0083694
-1.8083338 ]
[-0.54770464 -1.8895451 -2.8397787 ... 1.261335 1.1504486
0.80493224]
[-1.3517791 -2.2090058 -2.5202248 ... -2.2600229 -1.577823
-0.6845309 ]]
I would like to split the numpy array into 4 different numpy arrays. the first 3 would 30% of the numpy array. e.g. numpyarray1 should be 0-30%, numpyarray2 should be 31-60%, numpyarray3 should be 61-90% and numpyarray4 should be 91-100% of the dataset.
Solution 1:[1]
You can achieve this with numpy.split(). This function gives you quite a lot of options to split the array accordingly. Note however that it gives you a view on the original array (so no new array is created which saves memory).
See this example:
import numpy as np
arr = np.random.random((100, 100))
nr_rows = arr.shape[0]
# Get the indices for the first three sections
# You can do some fancy calculation for the first N sections
section_borders = [(i+1) * 3 * (nr_rows // 10) for i in range(3)]
# Do the splitting
arr_splits = np.split(arr, section_borders)
print([sarr.shape for sarr in arr_splits])
# --> [(30, 100), (30, 100), (30, 100), (10, 100)]
In case you want to split along columns, you can use the axis parameter in the function to get that accordingly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | FabianGD |
