'Why was this += operator modifying the input df of this function?

Was able to resolve this issue, but wondering if someone could provide a reason for why the code was breaking. A function using the += operator was modifying two variables, an input df (spardf) and a slice of the input df (start). Only the slice should have been modified. The function then passed the modified df out of the function which was being using in future calls, despite that not being the return statement var. I resolved by using explicit statement start = start + 1. Is there something I am missing about the += operator?

    def get5DayReturns(SPARdf):
        start = SPARdf.iloc[-5]
        start += 1
        for i in range(-4,0):
            start = start*(1+SPARdf.iloc[i])
        start -= 1
        return start

    rweeklydict = dict(get5DayReturns(SPAR)*100)

returns a modified SPAR df, as well as dict of start.



Solution 1:[1]

.iloc (and other indexer friends, even just [] in Pandas) return sliced views of the underlying dataframe instead of copying things. This is alluded to in the documentation over here and also here:

Whether a copy or a reference is returned for a setting operation, may depend on the context.

Modifying those views also modify the dataframe, since there's no new memory allocated.

Replace

start = SPARdf.iloc[-5]

with

start = SPARdf.iloc[-5].copy()

to get a separate copy.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AKX