'Why does .loc not always match column names?

I noticed this today and wanted to ask because I am a little confused about this.

Lets say we have two df's

df = pd.DataFrame(np.random.randint(0,9,size=(5,3)),columns = list('ABC'))
    A   B   C
0   3   1   6
1   2   4   0
2   8   8   0
3   8   6   7
4   4   5   0

df2 = pd.DataFrame(np.random.randint(0,9,size=(5,3)),columns = list('CBA'))

    C   B   A
0   3   5   5
1   7   4   6
2   0   7   7
3   6   6   5
4   4   0   6

If we wanted to conditionally assign new values in the first df with values, we could do this:

df.loc[df['A'].gt(3)] = df2

I would expect the columns to be aligned, and if there were missing columns, for the values in the first df to be populated with nan. However when the above code is run, it replaces the data and does not take into account the column names. (it does take the index names into account however)

    A   B   C
0   3   1   6
1   2   4   0
2   0   7   7
3   6   6   5
4   4   0   6

on index 2 instead of [7,7,0] we have [0,7,7].

However, if we pass the names of the columns into the loc statement, without changing the order of the columns in df2, it aligns with the columns.

df.loc[df['A'].gt(3),['A','B','C']] = df2
    A   B   C
0   3   1   6
1   2   4   0
2   7   7   0
3   5   6   6
4   6   0   4

Why does this happen?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source