'Pandas DataFrame Assignment Bug using Dictionaries of Strings and Floats?

Problem

Pandas seems to support using df.loc to assign a dictionary to a row entry, like the following:

df = pd.DataFrame(columns = ['a','b','c'])
entry = {'a':'test', 'b':1, 'c':float(2)}
df.loc[0] = entry

As expected, Pandas inserts the dictionary values to the corresponding columns based on the dictionary keys. Printing this gives:

      a  b    c
0  test  1  2.0

However, if you overwrite the same entry, Pandas will assign the dictionary keys instead of the dictionary values. Printing this gives:

   a  b  c
0  a  b  c

Question

Why does this happen?

Specifically, why does this only happen on the second assignment? All subsequent assignments revert to the original result, containing (almost) the expected values:

      a  b  c
0  test  1  2

I say almost because the dtype on c is actually an object instead of float for all subsequent results.


I've determined that this happens whenever there is a string and a float involved. You won't find this behavior if it's just a string and integer, or integer and float.

Example Code

df = pd.DataFrame(columns = ['a','b','c'])
print(f'empty df:\n{df}\n\n')

entry = {'a':'test', 'b':1, 'c':float(2.3)}
print(f'dictionary to be entered:\n{entry}\n\n')

df.loc[0] = entry
print(f'df after entry:\n{df}\n\n')

df.loc[0] = entry
print(f'df after second entry:\n{df}\n\n')

df.loc[0] = entry
print(f'df after third entry:\n{df}\n\n')

df.loc[0] = entry
print(f'df after fourth entry:\n{df}\n\n')

This gives the following printout:

empty df:
Empty DataFrame
Columns: [a, b, c]
Index: []


dictionary to be entered:
{'a': 'test', 'b': 1, 'c': float(2)}


df after entry:
      a  b    c
0  test  1  2.0


df after second entry:
   a  b  c
0  a  b  c


df after third entry:
      a  b  c
0  test  1  2


df after fourth entry:
      a  b  c
0  test  1  2


Solution 1:[1]

Interesting find. On pandas version 1.2.4, all the subsequent dataframes have the value a b c, not just the second one.

empty df:
Empty DataFrame
Columns: [a, b, c]
Index: []

dictionary to be entered:
{'a': 'test', 'b': 1, 'c': 2.3}

df after entry:
      a  b    c
0  test  1  2.3

df after second entry:
   a  b  c
0  a  b  c

df after third entry:
   a  b  c
0  a  b  c

Btw, it only seems to work correctly when assigning to a new row. So it's only associating the keys with the columns in that situation. For all subsequent re-assigning to existing rows, it has the observed unexpected behaviour, in 1.2.4.

df.loc[1] = entry
print(f'df after assigning to a new row:\n{df}\n\n')
# output:
df after assigning to a new row:
      a  b    c
0     a  b    c
1  test  1  2.3

df.loc[1] = entry
print(f'df after reapting:\n{df}\n')
# output:
df after reapting:
   a  b  c
0  a  b  c
1  a  b  c

The reason it may be happening for existing rows (apart from being a bug) is that it's iterating over the collection. In the case of dictionaries, it's the keys. In the docs section "Setting with enlargement"

The .loc/[] operations can perform enlargement when setting a non-existent key for that axis.

In the Series case this is effectively an appending operation.

So for new rows, it's "enlarging" the input but for existing rows, it's iterating over the input (keys for dicts, not values).

For a list, it woks as one would expect.

df.loc[2] = list(entry.values())
print(f'df when assigning from a list\n{df}\n')
# output
df when assigning from a list
      a  b    c
0     a  b    c
1     a  b    c
2  test  1  2.3


df.loc[2] = list(entry.values())
print(f'df when assigning from a list 2nd time\n{df}\n')
# output
df when assigning from a list 2nd time
      a  b    c
0     a  b    c
1     a  b    c
2  test  1  2.3

(That's the why based on the docs. I think the actual technical reason may only be apparent after perusing the source code.)

Imho, it should either work for all assignments/re-assignemnts or not be allowed at all. I agree that this should be raised as a bug, as @DeepSpace mentions.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aneroid