'Numpy's where function and length error message
I have a spreadsheet that I am trying to correct. On the Billing Categorization, it should be filled with 'Standard' or 'Non Standard' as applies.
I am trying to use the where function from numpy to do this:
df['Billing Categorization'] = np.where((df['Billing Categorization'].isnull(), ~df['AE Number'].isnull()), 'Standard', df['Billing Categorization'])
The idea is that the the empty values in Billing Categorization should be filled with "Standard" where in the same row the value in column 'AE Number' isn't empty.
However, I am getting the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-64-863f807f354c> in <module>
30 df.loc[df["PQC-Product"].isnull(),'PQC-Product'] = df["Request-Product"]
31
---> 32 df['Billing Categorization'] = np.where((df['Billing Categorization'].isnull(), ~df['AE Number'].isnull()), 'Standard', df['Billing Categorization'])
33
34 #We simply get the data out
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3161 else:
3162 # set column
-> 3163 self._set_item(key, value)
3164
3165 def _setitem_slice(self, key: slice, value):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3240 """
3241 self._ensure_valid_index(value)
-> 3242 value = self._sanitize_column(key, value)
3243 NDFrame._set_item(self, key, value)
3244
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
3897
3898 # turn me into an ndarray
-> 3899 value = sanitize_index(value, self.index)
3900 if not isinstance(value, (np.ndarray, Index)):
3901 if isinstance(value, list) and len(value) > 0:
~\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index)
749 """
750 if len(data) != len(index):
--> 751 raise ValueError(
752 "Length of values "
753 f"({len(data)}) "
ValueError: Length of values (2) does not match length of index (876)
Both columns have empty values, but I just want to fill those that applies. Obviously not all of them will be possible. I want to go from this:
| Number | Billing Categorization | Country | AE Number | AE country | Date |
|---|---|---|---|---|---|
| First | NaN | Italy | 55568 | Italy | 1-Jan-2022 |
| Second | NaN | France | NaN | NaN | NaN |
| Third | Standard | Spain | 85968 | Spain | 5-Jan-2022 |
| Fourth | Non Standard | UK | 748265 | UK | 5-Jan-2022 |
| Fifth | Standard | UK | 59632 | UK | 6-Jan-2022 |
| Sixth | NaN | UK | 78946 | UK | 7-Jan-22 |
To this one:
| Number | Billing Categorization | Country | AE Number | AE country | Date |
|---|---|---|---|---|---|
| First | Standard | Italy | 55568 | Italy | 1-Jan-2022 |
| Second | NaN | France | NaN | NaN | NaN |
| Third | Standard | Spain | 85968 | Spain | 5-Jan-2022 |
| Fourth | Non Standard | UK | 748265 | UK | 5-Jan-2022 |
| Fifth | Standard | UK | 59632 | UK | 6-Jan-2022 |
| Sixth | Standard | UK | 78946 | UK | 7-Jan-22 |
As you can see on the second row, as there is no AE Number, where shouldn't change anything, as this should stay blank. I have manually checked the length of both columns and they match, so what's wrong?
Solution 1:[1]
IIUC chain masks by &:
m = df['Billing Categorization'].isna() & df['AE Number'].notna()
df['Billing Categorization'] = np.where(m, 'Standard', df['Billing Categorization'])
Solution 2:[2]
You don't need np.where here, use indexing instead:
df[df['Billing Categorization'].isna() & df['AE Number'].notna()] = 'Standard'
Output:
| Number | Billing Categorization | Country | AE Number | AE country | Date |
|---|---|---|---|---|---|
| Standard | Standard | Standard | Standard | Standard | Standard |
| Second | nan | France | nan | nan | nan |
| Third | Standard | Spain | 85968 | Spain | 5-Jan-2022 |
| Fourth | Non Standard | UK | 748265 | UK | 5-Jan-2022 |
| Fifth | Standard | UK | 59632 | UK | 6-Jan-2022 |
| Standard | Standard | Standard | Standard | Standard | Standard |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
| Solution 2 | Corralien |
