'TypeError after vectorizing a function

I wrote the following function, that takes as input a dataframe that contains the min and max values for all combinations, and a list of arguments.

The function is the following. The first argument (df) is a dataframe where all combinations of keys and corresponding minimal and maximal values are stored, then, the following arguments are the keys and the amount. The function returns whether the amount is within or outside the expected range.

def within_range(df,CoCode,Lease_type,
                 Position,Mvt_Type,BKPF_WAERS,
                 BSEG_BSCHL,COBL_KOSTL,Amount):
    print(CoCode,Lease_type,Position,Mvt_Type,BKPF_WAERS,BSEG_BSCHL,COBL_KOSTL,Amount)
    mask=(df['CoCode']==CoCode)&(df['Lease_type']==Lease_type)&\
         (df['Position']==Position)&(df['Mvt_Type']==Mvt_Type)&\
         (df['BKPF-WAERS']==BKPF_WAERS)&(df['BSEG-BSCHL']==BSEG_BSCHL)&\
         (df['COBL-KOSTL']==COBL_KOSTL)
    mini = float(df.loc[mask,'min'].values)
    maxi = float(df.loc[mask,'max'].values)
    if mini <= Amount <= maxi:
        return 'OK, within range'
    else:
        return f'{str(Amount)} is outside range [{str(mini)};{str(maxi)}]'

if I test with following values:

within_range(df=df_3,CoCode='2510',Lease_type='1',Position='17310C',Mvt_Type='F30',BKPF_WAERS='HUF',BSEG_BSCHL='50',COBL_KOSTL='2510DDA-612121-01.C',Amount=2442000.0)

I get exactly the good output: 'OK, within range'

Now, I vectorized the function using np.vectorize and applied it to a second dataframe I need to check. For information, the first line corresponds exactly to the case successfully tested above.

This is how I called the function:

df_test['in_range']=np.vectorize(within_range)(df=df_3,
                                   CoCode=df_test['BKPF-BUKRS'],
                                   Lease_type=df_test['COBL-AUFNR'].str[5:6],
                                   Position=df_test['BSEG-HKONT'].str[0:6],
                                   Mvt_Type=df_test['BSEG-HKONT'].str[6:],
                                   BKPF_WAERS=df_test['BKPF-WAERS'],
                                   BSEG_BSCHL=df_test['BSEG-BSCHL'],
                                   COBL_KOSTL=df_test['COBL-KOSTL'],
                                   Amount=df_test['BSEG-WRBTR'],
                                  )

from the embedded print, I can see that the first line correspond exactly to the test above:

2510 1 17310C F30 HUF 50 2510DDA-612121-01.C 2442000.0

Then, problem: instead of populating the new column 'in_range' with the result of the function ('in range' or 'outside range'), I get a long TypeError message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-0d714785e5d5> in <module>
----> 1 df_test['in_range']=np.vectorize(within_range)(df=df_3,
      2                                    CoCode=df_test['BKPF-BUKRS'].values,
      3                                    Lease_type=df_test['COBL-AUFNR'].str[5:6].values,
      4                                    Position=df_test['BSEG-HKONT'].str[0:6].values,
      5                                    Mvt_Type=df_test['BSEG-HKONT'].str[6:].values,

c:\users\forszpaniak\appdata\local\programs\python\python39\lib\site-packages\numpy\lib\function_base.py in __call__(self, *args, **kwargs)
   2106             vargs.extend([kwargs[_n] for _n in names])
   2107 
-> 2108         return self._vectorize_call(func=func, args=vargs)
   2109 
   2110     def _get_ufunc_and_otypes(self, func, args):

c:\users\forszpaniak\appdata\local\programs\python\python39\lib\site-packages\numpy\lib\function_base.py in _vectorize_call(self, func, args)
   2184             res = func()
   2185         else:
-> 2186             ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
   2187 
   2188             # Convert args to object arrays first

c:\users\forszpaniak\appdata\local\programs\python\python39\lib\site-packages\numpy\lib\function_base.py in _get_ufunc_and_otypes(self, func, args)
   2144 
   2145             inputs = [arg.flat[0] for arg in args]
-> 2146             outputs = func(*inputs)
   2147 
   2148             # Performance note: profiling indicates that -- for simple

c:\users\forszpaniak\appdata\local\programs\python\python39\lib\site-packages\numpy\lib\function_base.py in func(*vargs)
   2101                     the_args[_i] = vargs[_n]
   2102                 kwargs.update(zip(names, vargs[len(inds):]))
-> 2103                 return self.pyfunc(*the_args, **kwargs)
   2104 
   2105             vargs = [args[_i] for _i in inds]

<ipython-input-47-ef44db83b86c> in within_range(df, CoCode, Lease_type, Position, Mvt_Type, BKPF_WAERS, BSEG_BSCHL, COBL_KOSTL, Amount)
      1 def within_range(df,CoCode,Lease_type,Position,Mvt_Type,BKPF_WAERS,BSEG_BSCHL,COBL_KOSTL,Amount):
      2     print(CoCode,Lease_type,Position,Mvt_Type,BKPF_WAERS,BSEG_BSCHL,COBL_KOSTL,Amount)
----> 3     mask=(df['CoCode']==CoCode)&(df['Lease_type']==Lease_type)&\
      4          (df['Position']==Position)&(df['Mvt_Type']==Mvt_Type)&\
      5          (df['BKPF-WAERS']==BKPF_WAERS)&(df['BSEG-BSCHL']==BSEG_BSCHL)&\

TypeError: string indices must be integers

I looked at previous messages for similar TypeError, and I asked for the values (e.g CoCode=df_test['BKPF-BUKRS'].values to get the true value and not a tuple. But I still get the message and don't see why.

Did I misunderstood the way vectorizing is working or is it that I am not allowed to vectorize the 'mask' inside the function?

NOTE 30/04/2022:

I moved the mask and the determination of mini and maxi values outside the vectorized function, in a separate function that is called by the vectorized one. This is how it's looking like, and it's working fine:

def get_minimax(df,CoCode,Lease_type,Position,Mvt_Type,BKPF_WAERS,BSEG_BSCHL,COBL_KOSTL):
    mask=(df['CoCode']==CoCode)&(df['Lease_type']==Lease_type)&\
         (df['Position']==Position)&(df['Mvt_Type']==Mvt_Type)&\
         (df['BKPF-WAERS']==BKPF_WAERS)&(df['BSEG-BSCHL']==BSEG_BSCHL)&\
         (df['COBL-KOSTL']==COBL_KOSTL)
    try:
        mini = float(df.loc[mask,'min'].values)
        maxi = float(df.loc[mask,'max'].values)
    except:
        mini = 0.0
        maxi = np.inf
    return mini, maxi

def within_range(CoCode,Lease_type,Position,Mvt_Type,BKPF_WAERS,BSEG_BSCHL,COBL_KOSTL,Amount):
    print(CoCode,Lease_type,Position,Mvt_Type,BKPF_WAERS,BSEG_BSCHL,COBL_KOSTL,Amount)
    mini,maxi = get_minimax(df_3,CoCode,Lease_type,Position,Mvt_Type,BKPF_WAERS,BSEG_BSCHL,COBL_KOSTL)
    print (mini,maxi,Amount)
    if float(mini) <= float(Amount) <= float(maxi):
        return 'OK, within range'
    else:
        return f'{str(Amount)} is outside range [{str(mini)};{str(maxi)}]'

In the code above, only within_range will be vectorized, but not get_minimax. It looks like filters can't be vectorized. Is my assumption correct ?

Solution 1:^[1]

The removal of the call to the global dataframe in the vectorized function shows that the problems come from this particular point.

Numpy's documentation mentions that "the

Blockquote The excluded argument can be used to prevent vectorizing over certain arguments. This can be useful for array-like arguments of a fixed length [...]

Considering my dataframe as a 'fixed length array-like' argument, I changed the initial code as follows:

df_test['in_range2']=np.vectorize(within_range2,excluded=['df'])(
                               df=df_3,
                               CoCode=df_test['BKPF-BUKRS'],
                               Lease_type=df_test['COBL-AUFNR'].str[5:6],
                               Position=df_test['BSEG-HKONT'].str[0:6],
                               Mvt_Type=df_test['BSEG-HKONT'].str[6:],
                               BKPF_WAERS=df_test['BKPF-WAERS'],
                               BSEG_BSCHL=df_test['BSEG-BSCHL'].values,
                               COBL_KOSTL=df_test['COBL-KOSTL'].values,
                               Amount=df_test['BSEG-WRBTR'].values,
                               )

The function works fine, now, and I find this exclusion much more elegant than removing and storing calls to the df into a separate function.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	JCF

'TypeError after vectorizing a function

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]