'chaining logical operators - ValueError: The truth value of a Series is ambiguous
I have a dictionary dataframe_dictconsisting of over 1000 dataframes dataframe_dict.items()). Each dataframe represents data collected from a location (i.e. one dataframe for each location), and each dataframe has the same data columns (key).
Each dataframe looks like this
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df
a b c d
0 0.325799 0.731273 0.467031 0.177742
1 0.084133 0.271076 0.761092 0.067709
2 0.946860 0.606838 0.260437 0.094640
3 0.076870 0.450473 0.693679 0.760893
For each dataframe, I want to find out which column(s) has over 30% missing values, and identify those columns and store them in reject_list.
This is how I currently identify these columns
reject_list =[]
for key, item in dataframe_dict.items():
if ((item[key].isnull().sum()) > (0.3*(len(item)))):
reject_list.append(item[key])
print('rejected due to more than 30% nulls: {}'.format(item[key]))
item.dropna(inplace=True)
item.reset_index(drop=True, inplace=True)
Python threw me this error on the logic
if ((item[key].isnull().sum()) > (0.3*(len(item)))):
File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 1535, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Looking at previous post I think I have created multiple series in this code where Boolean does not apply. How do I pass through this logic in this loop?
Solution 1:[1]
In your code:
for key, item in dataframe_dict.items():assigns tokeythe key of a DATAFRAME_DICT element and toìtemthe corresponding dataframe,In the loop body, you use
keyas if is the name of a column of the dataframe. But nothing assure thatkeyis a column name but you did not provide how you builddataframe_dict
It looks like the for statement in your code is not should be the one for a loop that you did not provided and that the correct could something like for col in item.columns an example. It looks like you have a confusion on key.
The code below tries to resolve the confusion.
A question is if reject_list should be built on a dataframe merging all the dataframes in dataframe_dict or for each dataframe_dict element as your code implied. In the code below, reject_list build at the level of the dataframe_dict elements. But at the end of the process the dataframe_dict elements will probably not have the same columns.
reject_list =[]
for key, item in dataframe_dict.items():
for col in item.columns:
if ((item[col].isnull().sum()) > (0.3*(len(item)))):
reject_list.append((key, col))
print(f"In dataframe '{key}', column '{col}' rejected due to more than 30% null.")
item.dropna(inplace=True)
item.reset_index(drop=True, inplace=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
