'Python functions treating pandas dataframes as a global variable
I came across a pandas curiosity, which I can't find replicated on SO. It looks like for some use cases, pandas dataframes are treated as global variables in python functions, not local variables. For example:
df = pd.DataFrame({'A':[1, 2, 3, 4],
'B':['a', 'b', 'c', 'd']})
def some_function(x):
x['new'] = 0
return
some_function(df)
print(df)
A B new
0 1 a 0
1 2 b 0
2 3 c 0
3 4 d 0
Experimenting around, this behaviour stops as soon as you start copying data around within the function.
df = pd.DataFrame({'A':[1, 2, 3, 4],
'B':['a', 'b', 'c', 'd']})
def some_function(x):
y = x.copy()
y['new'] = 0
x = y.copy()
return
some_function(df)
print(df)
A B
0 1 a
1 2 b
2 3 c
3 4 d
My question is - is this an intentional feature of pandas (and if so, for what purpose?), or just an accidental side-effect of how pandas dataframes are stored and operated on in memory? It doesn't happen with numpy arrays, as far as I can tell.
Solution 1:[1]
This is normal python behaviour and not pandas specific:
Have a look on the following code:
l = []
def a():
l.append(42)
def b():
l = [1]
a()
;l => [42]
In your case x is a global variable and in some_function you are modifying that global variable. In the second case x= y.copy() does not modify the global variable x. Instead you create a new local variable with the name x, that shadows the global x.
If you want to redefine the global x instead. You must declare x as global in your function
def some_function(x):
global x
y = x.copy()
y['new'] = 0
x = y.copy()
return
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Hatatister |
