'condition string evaluation in pandas.DataFrame.query pandas 1.2.0 vs. 1.4.1
A python script ran smoothly in Python 3.8.5 with Pandas 1.2.0, but failed on same Python version with newer Pandas 1.4.1.
The problem seems to be that the string evaluation in pandas.DataFrame.query behave differently.
I am able to reproduce the error with below sample code:
import pandas as pd
class Test:
def __init__(self,config):
self.df = pd.DataFrame({'a':[1,2,3,4], 'b':[4,3,2,1]})
self.config = config
def a_more_than_i_1(self):
return(self.df.query("a>@self.config['i']"))
def a_more_than_i_2(self):
i = self.config['i']
return(self.df.query("a>@i"))
config = {'i':2, 'j':3}
t = Test(config)
# 1st approach
t.a_more_than_i_1()
# 2nd approach
t.a_more_than_i_2()
Both function a_more_than_i_1/2 work in Pandas 1.2.0. However, only _2 works in Pandas 1.4.1. _1 in 1.4.1 gave below error:
ValueError: data type must provide an itemsize
I can re-write all the functions in 2nd approach, but it doesn't look elegant at all and I'd like to know what is the correct way to handle this? If I did it wrong, why did first approach work in 1.2.0 but failed in 1.4.1?
Thanks very much.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
