'Drop Columns in Pandas Dataframe: Inconsistency in Output

Problem: While dropping column labelled 'Happiness_Score' below, I'm getting it dropped in the parent Dataframe as well. This is not supposed to happen, would like clarification on this?

A = df_new
A.drop('Happiness_Score', axis = 1, inplace = True)

This is the output: As you can see the column gets dropped in df_new too; isn't inplace = True mean that it gets dropped only in the A Dataframe.

Output 1

NOTE: I'm able to workaround this by changing the code; now output is as expected.

B=df_new.drop('Happiness_Score', axis = 1)

enter image description here



Solution 1:[1]

Actually, when you do A = df_new you are not creating a copy of the Dataframe, rather just a pointer. So to execute this correctly you should use A = df_new.copy() When you are selecting a subset or indexing: A = df_new[condition] then it creates copy of a slice of a dataframe, so your workaround works too.

Solution 2:[2]

A = def_new creates a new reference to your original def_new, an not a new copy. You are binding A to the same thing def_new holding the reference to. And what happens when you do modification in a reference? It is reflected in the original object. I'll illustrate this with an example.

orgList = [1,2,3,4,5]
bkpList = orgList
print(bkpList is orgList) #OUTPUT: True

This is because both variables are pointing to same list. Modify any one, and change will be reflected in original list. Same thing can be observed in your dataframe case.

Solution: Keep a separate copy of your dataframe.

Solution 3:[3]

The variable A is a reference to df_new. Try creating A by doing a complete slice of df_new or df_new.copy().

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 krewsayder