'Python: Speeding Up If-Else Function When Iterating Over Pandas DataFrame

I have a very large dataframe (585k rows) where I need to manipulate the data in one of the columns, conditional on the values in other columns.

For example, the relevant columns are:

'time' - which occasionally resets back to zero
'order_executed' - Bool value or NaN
'order_id' - id
'cum_price' - cumulative total price for the period
'single_order_id' - id associated with the order of a single product
'single_order_executed' - Bool value or NaN, associated with single_order_id

and columns associated with each order_id (in order to produce the cumulative cost associated with the time).

My problem: the total_price column does not update when the single_order_exec = True, although it is taken into account when the 'total_price' is updated on the next larger order. For example, if order_id_A places an order that cost $5, the 'cum_price' is updated to a value of 5. However, if order_id_A places a 'single_order' the 'cum_price' is not updated to a value of 6. When they place their next large order, though, for $5, the 'cum_price' is updated to a value of $11.

My goal: for the unique id's in the dataset, generate an accurate 'total_price'. I.e., I am creating a new column that tracks the total_price at any given time, in order to track the cost associated with each customer for each period.

Currently, I have the following code:

def price(order_id):
    col = str(order_id)
    for i in range(len(orders)):
        cost = 0
        if orders['time'][i] == 0:
            orders[col][i] = cost
        elif ((orders['order_executed'][i]==True)&(orders['order_id'][i]==order_id):
            cost = orders['total_price']
            orders[col][i] = cost
        elif ((orders['single_order_executed'][i]==True)&(orders['single_order_id'][i]==order_id):
            cost += 1
            orders[col][i] = cost
        else:
            orders[col][i] = cost

This returns the expected result when I feed in the unique order_ids, but it takes far too long to run (hours in some cases).

Can anyone help provide a faster solution?

I've tried using a few different methods (iterating over a dict, itertuples, etc.) but I can't seem to get the syntax right.

Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source