'Make conditions in Pandas DataFrames optional (based on user input)

In a script that accepts user input to query several columns of a spreadsheet, I am using Pandas to combine conditions, e.g.

output1=f.loc[f['pers_name'].isin(user_list) & (f['event_start'].values==q_year)]

If there are just two conditions, I can easily use if and else to handle cases in which the user either does not enter person names or no date to retrieve all data that meet just one of the conditions.

However, I eventually need to manage six different columns. Is there a way to make the DataFrame treat conditions as optional if there is no user input for one or more without if and elif for all possible combinations?

I was hoping there was an argument I could add to each condition, but I have not found what I am looking for in the documentation yet.



Solution 1:[1]

In this kind of cases, we can use numpy.select like in this example :

import numpy as np
import pandas as pd

val1 = 1
val2 = 10

condlist = [df['col1'] == val1, 
            df['col1'] == val2, 
            df['col3'] == val1]

choicelist = [df['column1'],
              df['column2'],
              df['column3']]            

df['Vals'] = np.select(condlist, choicelist, default=np.nan)

Solution 2:[2]

Based on @tlentali 's sample, I have been able to ingest user input as variables in a condition list.

Each input can be a list of keywords separated by commas. The input is then split:

# separate user input if lists have been submitted
qn=q_name.split(",")
qy=q_year.split(",")
qi=q_inst.split(",") 
qt=q_title.split(",") 
qf=q_func.split(",") 
qr=q_rel.split(",")

# define possible conditions

condlist = [f['pers_name'].isin(qn), 
            f['event_start'].isin(qy),
            f['inst_name'].isin(qi), 
            f['pers_title'].isin(qt),
            f['pers_function'].isin(qf),
            f['rel_pers'].isin(qr)]

# define full range of choices

choicelist = [f['pers_name'], 
            f['event_start'],
            f['inst_name'], 
            f['pers_title'],
            f['pers_function'],
            f['rel_pers']]

As I want to capture all relevant cells values in the original EXCEL table (read as DataFrame f), I am retrieving the following output:

output = np.select(condlist, choicelist, default=0)
new_array=f.to_numpy()
rows=np.where(output)
print("Elements meeting required conditions:", new_array[rows])

The output looks like this in the console:

Elements meeting required conditions: 
[['Stk_00010' 9 nan ... nan nan nan]
 ['Stk_00052' 9 nan ... nan nan nan]
 ['Stk_00087' 9 nan ... nan nan nan]
 ...
 ['Stk_03161' 805 nan ... nan nan nan]
 ['Stk_05965' 804 nan ... nan nan nan]
 ['Stk_05966' 805 nan ... nan nan nan]]
Done.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tlentali
Solution 2 OnceUponATime