'Why does pandas `loc` throw `KeyError` with column name?

I have a data frame that is given this initial construct:

df_data = pd.DataFrame(columns=['name','date','c1','c2']).set_index(['name','date'])

I then have code to fill this frame from a data base. I can print some or all of the frame and get a sensible result. Something like:

print df_data.c1.head(3)

name date
Joe  2019-01-01 234324
     2019-01-02 4565
     2019-01-03 573
Name: c1, dtype: object

After filling from the data base, I have various analysis calculations that try to access the data using loc as, for example, df_data.loc['Joe', 'c1'] I expect to get a result from from that with date for an index and the values of column c1, where the "name" part of the multiindex has been selected down to 'Joe'. Something like:

print df_data.loc['Joe', 'c1']

date
2019-01-01 234324
2019-01-02 4565
2019-01-03 573
Name: c1, type: object

I've run this three times, filling the frame with different ranges of date. Two of the three work as expected and described above. In the third, I get KeyError: ('Joe', 'c1') for df_data.loc['Joe', 'c1'] but, even in this "broken" case, I get a perfectly nice result for df_data.loc['Joe'].c1, which I think should give the same answer in this case. I can also print the entire frame df_data and get a perfectly sensible result. I interpret the KeyError here to mean that Pandas thinks that c1 should be in the index rather than it giving a column name.

I cannot reproduce this in a stand-alone example as, for reasons I cannot understand, the result seems to depend on the data in the frame rather than structure of the frame. (Same structure "works" for two of three cases.) So specific questions:

  • Why or under what circumstances would the syntax loc['Joe', 'c1'] cause c1 to be treated as part of the key instead of a column name? (Whatever other error I may have, I don't see where the second argument here should be interpreted as part of the key under any documented scenario, e.g. I do not have something like loc[('Joe','c1')].)
  • Are there known or documented cases where something about the data in the frame could cause such a change in how the data access call is interpreted?


Solution 1:[1]

Use tuple notation: df_data.loc[('Joe', 'c1')]. See: https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-indexing-with-hierarchical-index

Solution 2:[2]

Two interesting ways:

One could let the old entries be taken out, but reach in with SQL and extract what you wanted as a time-bound query.

A second way would be to automate the restarting of kismet... which is a little less elegant.. but seems to work.

https://magazine.odroid.com/article/home-assistant-tracking-people-with-wi-fi-using-kismet/

If you read that article carefully... there are lots of bits if interesting information here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mose Wintner
Solution 2 Ken S