'How can I pass function as argument with parameter in python pandas pipe

I wanna make some function to use pipe of pandas.
Like this

import pandas as pd

def foo(df):
   df['A'] = 1
   return df

def goo(df):
   df['B'] = 2
   return df

def hoo(df, arg1):
   df[arg1] = 3
   return df


df = pd.DataFrame.from_dict({"A":[1, 2, 3],
                            "B":[4, 5, 6]})
print(df)

(df.pipe(foo)
  .pipe(goo)
  .pipe(hoo, arg1='Hello')
)

print(df)

The first print is

   A  B
0  1  4
1  2  5
2  3  6

The second ptint is

   A  B  Hello
0  1  2      3
1  1  2      3
2  1  2      3

It is meaningless code and easy to understand.

There are many combination of function sch as foo, goo, hoo. I need to abstract this pipe code.

import pandas as pd

def foo(df):
    df['A'] += 1
    return df

def goo(df):
    df['B'] += 2
    return df

def hoo(df, arg1):
    df[arg1] = 3
    return df


def pipe_line(df, func_list, kargs_list):
    for func, kargs in zip(func_list, kargs_list):
        df = func(df, **kargs)
    return df

df = pd.DataFrame.from_dict({"A":[1, 2, 3],
                             "B":[4, 5, 6]})

df = pipe_line(df, 
    [foo, goo, hoo], 
    [{}, {}, dict(arg1="HELLO")])

print(df)

But, pipe_line function is very ugly. How can I upgrade readability of this function?



Solution 1:[1]

pipe_line doesn't really have to much at all: just repeatedly apply functions to the return values of previous functions until you are out of functions.

def pipe_line(df, fs):
    for f in fs:
       df = f(df)
    return df

The trick is to define appropriate functions that all take a single dataframe argument. functools.partial helps with that.

from functools import partial


df = pipeline(df, [foo, goo, partial(hoo, arg1="HELLO")])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chepner