'Text preprocessing in a different language
With this options it is possible to make a preprocessing text analyis for english language
dflemma <-
spacy_parse(structure(df2$term, names = df2$id), lemma = TRUE, pos = FALSE) %>%
group_by(id = sub("(.+)-(.+)", "\\1", doc_id)) %>%
summarise(text = paste(lemma, collapse = " "))
myCorpus <- corpus(dflemma[["text"]], docnames = dflemma[["id"]])
mystopwords <- c("can")
myDfm <- myCorpus %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
tokens_remove(pattern = c(stopwords(source = "smart"), mystopwords)) %>%
dfm(verbose = FALSE)
How is it possible to make for german and greek language the removal of stopwords and stemming?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
