'Python divide dataframe into chunks
I have a 1 column df with 37365 rows. I would need to separate it in chunks like the below:
df[0:2499]
df[2500:4999]
df[5000:7499]
...
df[32500:34999]
df[35000:37364]
The idea would be to use this in a loop like the below (process_operation does not work for dfs larger than 2500 rows)
while chunk <len(df):
process_operation(df[lower:upper])
EDIT: I will be having different dataframes as inputs. Some of them will be smaller than 2500. What would be the best approach to also capture these?
Ej: df[0:1234] because 1234<2500
Solution 1:[1]
The range function is enough here:
for start in range(0, len(df), 2500):
process_operation(df[start:start+2500])
Solution 2:[2]
Do you mean something like that?
lower = 0
upper = 2499
while upper <= len(df):
process_operation(df[lower:upper])
lower += 2500
upper += 2500
Solution 3:[3]
I would use
import numpy as np
import math
chunk_max_size = 2500
chunks = int(math.ceil(len(df) / chunk_max_size))
for df_chunk in np.array_split(df, chunks):
#where: len(df_chunk) <= 2500
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Serge Ballesta |
| Solution 2 | Deniz Polat |
| Solution 3 | Vitaly Mirkis |
