'I am new to ML and I dont know how to solve this problem,Can some one help me?
Download the dataset, where the first four columns are features, and the last column corresponds to categories (3 labels). Perform the following tasks.
- Split the dataset into train and test sets (80:20)
- Construct the Naive Bayes classifier from scratch and train it on the train set. Assume Gaussian distribution to compute probabilities.
- Evaluate the performance using the following metric on the test set a. Confusion matrix b. Overall and class-wise accuracy c. ROC curve, AUC
- Use any library (e.g. scikit-learn) and repeat 1 to 3
- Compare and comment on the performance of the results of the classifier in 2 and 4 6. Calculate the Bayes risk. Consider, λ = 2 1 6 4 2 4 6 3 1 Where λ is a loss function and rows and columns corresponds to classes (ci) and actions (aj) respectively, e.g. λ(a3 / c2) = 4
Solution 1:[1]
It's not clear what specific part of the problem you're having trouble with, which makes it hard to give specific advice.
With that in mind, here is some reading that might help get you started:
- If the dataset is in CSV format, you can read it into a dataframe using pd.read_csv() as discussed here: https://www.geeksforgeeks.org/python-read-csv-using-pandas-read_csv/
- To split the df into a train set and test set, you can import scikit-learn (sklearn) and then use train_test_split() as discussed here: https://www.stackvidhya.com/train-test-split-using-sklearn-in-python/
- It sounds like your professor (or whoever is the source of this question) wants you to write a function that duplicates a Naive Bayes classifier, so I'll leave you to figure that out. Sklearn does provide a Naive Bayes classifier you can read about here and use to verify your results: https://scikit-learn.org/stable/modules/naive_bayes.html
- For confusion matrices, sklearn (again) provides some functionality that will let you plot a confusion matrix: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html#sklearn.metrics.ConfusionMatrixDisplay.from_predictions
- For the ROC curve, you can see here: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
Hope this is enough to get you started.
Solution 2:[2]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(label,targets,
test_size=0.20,random_state=42)
example of gaussian naive bayes
from sklearn.naive_bayes import GaussianNB # define the model model = GaussianNB() # fit the model model.fit(X_train,y_train)-
predict=model.predict(x_test) matrix = classification_report(y_test,predict) print('Classification report :\n',matrix) https://scikit-learn.org/stable/modules/cross_validation.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dharman |
| Solution 2 |
