'Python functions treating pandas dataframes as a global variable

I came across a pandas curiosity, which I can't find replicated on SO. It looks like for some use cases, pandas dataframes are treated as global variables in python functions, not local variables. For example:

df = pd.DataFrame({'A':[1, 2, 3, 4],
                   'B':['a', 'b', 'c', 'd']})

def some_function(x):
    x['new'] = 0
    return

some_function(df)
print(df)

   A  B  new
0  1  a    0
1  2  b    0
2  3  c    0
3  4  d    0

Experimenting around, this behaviour stops as soon as you start copying data around within the function.

df = pd.DataFrame({'A':[1, 2, 3, 4],
                   'B':['a', 'b', 'c', 'd']})

def some_function(x):
    y = x.copy()
    y['new'] = 0
    x = y.copy()
    return

some_function(df)
print(df)

   A  B
0  1  a
1  2  b
2  3  c
3  4  d

My question is - is this an intentional feature of pandas (and if so, for what purpose?), or just an accidental side-effect of how pandas dataframes are stored and operated on in memory? It doesn't happen with numpy arrays, as far as I can tell.



Solution 1:[1]

This is normal python behaviour and not pandas specific:

Have a look on the following code:

l = []

def a():
    l.append(42)

def b():
    l = [1]

a()
;l => [42]

In your case x is a global variable and in some_function you are modifying that global variable. In the second case x= y.copy() does not modify the global variable x. Instead you create a new local variable with the name x, that shadows the global x. If you want to redefine the global x instead. You must declare x as global in your function

def some_function(x):
    global x
    y = x.copy()
    y['new'] = 0
    x = y.copy()  
    return

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Hatatister