'Naïve Bayes classifier does not learn

I am new to the programming language R.

I want to set up a Naïve Bayes classifier, which classifies descriptions of campaigns as 0 or 1 (depending on whether the campaign was successful or not).

The data set can be found here.

My code is the following:

library(tidyverse)
library(tidymodels)
library(textrecipes)
library(discrim)

df <- read_csv("data/kickstarter.csv.gz")

# create categorical from numerical data
df$state <- as.factor(df$state)

# do not use the whole data frame
df <- df %>% slice(1:1e5)
df <- filter(df, nchar(blurb) >= 15)

# split into training and test set
df_split <- initial_split(df)
df_train <- training(df_split)
df_test <- testing(df_split)

# create folds for cross validation
folds <- vfold_cv(df_train)

# pre-process texts
rec <- recipe(state ~ blurb, data = df) %>%
  step_tokenize(blurb) %>%
  step_tokenfilter(blurb, max_tokens = 1e3)

# transform to numerical data
rec <- rec %>% step_tfidf(blurb)

# specify model
nb_spec <- naive_Bayes() %>%
  set_mode("classification") %>%
  set_engine("naivebayes")

# create workflow
nb_wf <- workflow() %>%
  add_recipe(rec) %>%
  add_model(nb_spec)

# fit & do cross validation
nb_rs <- fit_resamples(
  nb_wf,
  folds,
  control = control_resamples(save_pred = TRUE)
)

# look at accuracy
nb_rs_metrics <- collect_metrics(nb_rs)
nb_rs_metrics

It turns out that the accuracy of the classifier is only 0.52. However, I have no idea how I can access this problem. Does anyone have an idea where my mistake could be?

Thank you already!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source