'Pandas - Concatenating DataFrames with differing columns converts numpy.bool to bool
I'm trying to concat two DataFrames that contain a variety of datatypes, and the two dataframes don't always have the same columns. This is causing an issue where a column of numpy.bool turns into a column of regular bool, which causes errors later in the code unless these columns are converted back to numpy.bool. Aside from ensuring the DataFrames have the same columns before concatenating, or converting back to numpy.bool after the fact, are there any other solutions out there? Is this a bug or is there a valid reason behind this behavior? I am using python 3.6.
Here is a code sample of a simplified case:
import pandas as pd
df_1 = pd.DataFrame({'col1': [1.0, 2.0, 3.0], 'col2': [True, False, False]})
df_2 = pd.DataFrame({'col1': [1.0, 2.0, 3.0], 'col3': [False, True, True]})
print(f'col2 type before: {type(df_1.col2[0])}')
print(f'col3 type before: {type(df_2.col3[0])}')
df_comb = pd.concat([df_1,df_2], copy=False, ignore_index=True)
print(f'col2 type after: {type(df_comb.col2[0])}')
print(f'col3 type after: {type(df_comb.col3[3])}')
Resulting output:
col2 type before: <class 'numpy.bool_'>
col3 type before: <class 'numpy.bool_'>
col2 type after: <class 'bool'>
col3 type after: <class 'bool'>
I've tried using join='inner' and join='outer' to no avail. Thanks in advance.
Edit: The reason bool element types are problematic is due to this statement not working with regular bool, but works with np.bool:
df = df_comb[~df_comb.col2]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
