'Increase Loop Speed (Pandas dataframe)
Good Afternoon
I have created a function that assign a value depending on the previous row in a dataframe:
#Function to calculate cycles
def new_cycle (dfTickets_CC, cicle, id, prev_id,prev_status):
global new_cicle
if cicle is not None:
new_cicle = cicle
elif id != prev_id:
if not dfTickets_CC.loc[dfTickets_CC['Ticket_ID'].isin([id])].empty:
new_cicle = dfTickets_CC[dfTickets_CC['Ticket_ID'] == id]['Cicle_lastNr'].values[0] + 1
else:
new_cicle = 1
elif id == prev_id:
if prev_status == "Completed":
new_cicle = int(new_cicle)
new_cicle += 1
else:
new_cicle = new_cicle
return str(new_cicle).split(".")[0]
I call the function and iter the dataframe :
#Step 4, Calculating new cicle
ncicle = []
for i in range(len(dfCompilate.index)):
if i == 0:
ncicle.append(new_cycle(dfTickets_CC,dfCompilate['Cicle'].values[i],dfCompilate['Ticket_ID'].values[i],None,None))
else:
ncicle.append(new_cycle(dfTickets_CC,dfCompilate['Cicle'].values[i],dfCompilate['Ticket_ID'].values[i],dfCompilate['Ticket_ID'].values[i-1],dfCompilate['Status'].values[i-1]))
dfCompilate['New_cicle'] = ncicle
Problem is that, even though it works correctly, it takes a lot of time... For instance, it takes 2 hours to process a dataframe with 500,000 rows
Does anybody know how to make it faster?
Thanks in advance
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
