'Increase Loop Speed (Pandas dataframe)

Good Afternoon

I have created a function that assign a value depending on the previous row in a dataframe:

#Function to calculate cycles

def new_cycle (dfTickets_CC, cicle, id, prev_id,prev_status):
    global new_cicle
    if cicle is not None:
        new_cicle = cicle
    elif id != prev_id:
        if not dfTickets_CC.loc[dfTickets_CC['Ticket_ID'].isin([id])].empty:
            new_cicle = dfTickets_CC[dfTickets_CC['Ticket_ID'] == id]['Cicle_lastNr'].values[0] + 1
        else:
            new_cicle = 1
    elif id == prev_id:
        if prev_status == "Completed":
            new_cicle  = int(new_cicle)
            new_cicle += 1
        else:
            new_cicle = new_cicle
    return str(new_cicle).split(".")[0]

I call the function and iter the dataframe :

#Step 4, Calculating new cicle

ncicle = []
for i in range(len(dfCompilate.index)):
    if i == 0:
        ncicle.append(new_cycle(dfTickets_CC,dfCompilate['Cicle'].values[i],dfCompilate['Ticket_ID'].values[i],None,None))
    else:
        ncicle.append(new_cycle(dfTickets_CC,dfCompilate['Cicle'].values[i],dfCompilate['Ticket_ID'].values[i],dfCompilate['Ticket_ID'].values[i-1],dfCompilate['Status'].values[i-1]))
dfCompilate['New_cicle'] = ncicle

Problem is that, even though it works correctly, it takes a lot of time... For instance, it takes 2 hours to process a dataframe with 500,000 rows

Does anybody know how to make it faster?

Thanks in advance



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source