'Ideally maintaining the same structure, how can I parameterize this batching function to control the batch sizes?
I have hard-coded the following algorithm to achieve a task I am wishing to accomplish, in this case returning 5 consecutive values at a time for each time step in a Pandas DataFrame. I am hoping to structure this as a defined function that takes in input a Pandas DataFrame as well as a number to batch the frame by: in the example provided below, 5, but this could vary depending upon the value of the parameter passed in input.
The code currently follows this structure:
Last_5 = []
for i, j in enumerate(Pandas_Dataframe['Column']):
sublist = [i-4 , list(Pandas_Dataframe['Column'])[i-4],
i-3 , list(Pandas_Dataframe['Column'])[i-3],
i-2 , list(Pandas_Dataframe['Column'])[i-2],
i-1 , list(Pandas_Dataframe['Column'])[i-1],
i-0 , list(Pandas_Dataframe['Column'])[i-0]]
Last_5.append(sublist[1::2])
Parameterized, I would like for it to follow this new structure:
def Batcher(delta_t, n):
...
...
...
return Last_n
Solution 1:[1]
Figured it out and have since used it to generate, store, and analyze 400,000,000+ data points and not encountered memory issues.
Solution:
def Batcher(vector, delta_t, gap):
indexPlaceholderList = []
valuePlaceholderList = []
for t, j in tqdm(enumerate(vector), total = len((vector * delta_t))/1):
for i in range(0, delta_t, gap):
indexPlaceholderList.append(t-i)
valuePlaceholderList.append((list(vector)[t-i]))
values = [valuePlaceholderList[z:z+delta_t] for z in range(0, len(valuePlaceholderList), delta_t)]
indices = [indexPlaceholderList[z:z+delta_t] for z in range(0, len(indexPlaceholderList), delta_t)]
for i in values:
i.reverse()
for i in indices:
i.reverse()
indices[:delta_t] = [0] * delta_t
return values
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |