'How to define binary output for a classification model in Python

I am predicting prediabetes with machine learning.

I have x features and my output is prediabetes (yes or no)

Prediabetes (LBXGH) is defined according to the range 5.7-6.4, so this range should be 1. The remaining values regarding this feature must be 0.

I have tried many combinations where everyone has reported errors so far.

This is where I am now:

df\[df\['LBXGH'\] \>= 5.7 | \[df\['LBXGH'\] \<= 6.4 \]\] = 1
df\[df\['LBXGH'\] \< 5.7 | \[df\['LBXGH'\] \> 6.4\]\] = 0

TypeError: unsupported operand type(s) for |: 'float' and 'list')

The goal is to have all values replaces with either 0 or 1 regarding to the constrains, so i can get futheron with the feature selection proces.



Solution 1:[1]

Assuming this is the setup:

df[df['LBXGH'] >= 5.7 | [df['LBXGH'] <= 6.4 ]] = 1
df[df['LBXGH'] < 5.7 | [df['LBXGH'] > 6.4]] = 0

It looks like the issue might be that you're not structuring the logic and assignment properly. What happens when you enclose in parentheses and create a value?

df.loc[(df['LBXGH'] >= 5.7) & (df['LBXGH'] <= 6.4 ), 'PDB'] = 1
df.loc[(df['LBXGH'] < 5.7) | (df['LBXGH'] > 6.4), 'PDB'] = 0

The code creates a column 'PDB' and assigns 1 if 'LBXGH' is within the bounds you described and 0 otherwise. This should exclude any entries that are missing values.

Changing the OR (|) to an AND (&) in the first input is somewhat irrelevant because it will be corrected by the second input, but you technically do only want values that are greater than or equal to 5.7 AND less than or equal to 6.4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1