'Pandas: Using df.eval with string variables as conditional filtering

I have a simulated dataframe with one column created with:

df = pd.DataFrame({'A': np.arange(1,201)})

which is just a dataframe with numbers 1 to 200 with one column, "A". I would like to filter the dataframe based on a conditional statement like

df[df["A"] > 20]

but the column name, the boolean, >, and the value, 20, will have to be passed in as a string variable. So, I believe a dataframe.eval function in pandas should be used for this. I created a function called select_twenty for doing this. Here is it:

def select_twenty(input_df, column_name, boolean_arg, value):
    evaluated = input_df.eval(input_df[input_df[column_name] + boolean_arg + value])
    return evaluated

In the function above, input_df is the simulated dataframe above, column_name is the name of the chosen column and boolean_arg is the boolean, >, while value is the value 20. The last three arguments are passed in as strings in the function call:

select_twenty(df, "A", ">", "20")

When I call the function, it keeps giving me a UFuncTypeError. I have searched all over Google and do not know how to resolve it. I have not seen an example too where an eval in pandas was done this way. So, please, can someone help me with the filter? Thank you



Solution 1:[1]

The error is relative to the + inside of the eval argument, because you are trying to add the DataFrame column values with boolean_arg. What you are looking for is:

def select_twenty(input_df, column_name, boolean_arg, value):
    evaluated = input_df[input_df.eval(column_name + boolean_arg + value)]
    return evaluated
print(select_twenty(df, "A", ">", "20"))
       A
20    21
21    22
22    23
23    24
24    25
..   ...
195  196
196  197
197  198
198  199
199  200

[180 rows x 1 columns]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user2246849