'Email Spam Classification Using Python
My task is to make a program in python that can classify the email as spam if the subject of the email is empty. This is my attempt. I'm new to python, So I may have some silly mistakes. The code is running fine but it does not work correctly. It should print one spam but instead, it classifies all the emails as not spam. As you can see in the image, there is one email it doesn't have a subject. Output of excel file container in python
import pandas as pd
ExcelFile = pd.read_excel(r'C:\Users\Email Table.xlsx')
Subject = pd.DataFrame(ExcelFile, columns=['Subject'])
def spam(Subject):
df_multiindex = ExcelFile.set_index(['Subject'])
n = len(df_multiindex)
for x in range(n):
if ((pd.isnull(ExcelFile.loc[x, 'Subject'])) == "True"):
print("Spam")
else:
print("not spam")
spam(Subject)
Solution 1:[1]
Problem here is you are comparing the output of pd.isnull to a string "True" when the output is a boolean (True or False).
In Python a True or False value is expressed without quote marks. If you compare it to a string as you are here then the outcome will always be False. More info here https://www.w3schools.com/python/python_booleans.asp
If you remove the " around True then your code should work as expected.
for x in range(n):
if ((pd.isnull(ExcelFile.loc[x, 'Subject'])) == True):
print("Spam")
else:
print("not spam")
There are some other things you can do to simplify your code.
Pandas can iterate over rows using iterrows. You don't need to calculate n and use range.
https://www.w3schools.com/python/pandas/ref_df_iterrows.asp
Also as the output of isnull is going to be True or False, you don't need to check if it is equal to True - just use the output directly. You are also using unnecessary brackets in your if statement.
for index, row in ExcelFile.iterrows():
if pd.isnull(row['Subject']):
print("Spam")
else:
print("not spam")
Have fun with Python :-)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
