'Why does this pandas DataFrame.query expression work with literals but not a variable substitution?
I've run into a situation where I can query a pandas dataframe with column names and string literals in the query string
import pandas as pd
data = pd.DataFrame([list('aba'),
list('xyz')],
columns=['A', 'B', 'C'])
data.query("A == 'a'")
# A B C
# 0 a b a
data.query("A.eq('a')")
# A B C
# 0 a b a
but when I use a variable substitution instead of the column name, I get errors.
x = 'A'
data.query("@x == 'a'")
# KeyError: 'False: boolean label can not be used without a boolean index'
data.query("@x.eq('a')")
# AttributeError: 'str' object has no attribute 'eq'
What's going on?
Solution 1:[1]
Variables referenced by @ in pandas.eval are supposed to reference variables in the local namespace- not columns within the DataFrame. So it interprets variables prefixed by @ as non-literals. (e.g. "@x.eq('a')" is expanded to "'A'.eq('a')", whereas you want "A.eq('a')")
https://pandas.pydata.org/docs/user_guide/enhancingperf.html#local-variables
If you want to substitute column names in and have pandas parse them as such, you can use the string formatting method, or f-string.
import pandas as pd
data = pd.DataFrame([list("aba"), list("xyz")], columns=["A", "B", "C"])
x = "A"
data.query(f'{x} == "a"')
A B C
0 a b a
This way the string is interpolated before pandas gets a hold of it and it can process those variables referring to a column.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
