'Sentiment analysis of non-English texts
I want to analyze sentiment of texts that are written in German. I found a lot of tutorials on how to do this with English, but I found none on how to apply it to different languages.
I have an idea to use the TextBlob Python library to first translate the sentences into English and then to do sentiment analysis, but I am not sure whether or not it is the best way to solve this task.
Or are there any other possible ways to solve this task?
Solution 1:[1]
Now there is a pre-trained sentiment classifier for German text. Hugging Face has released two open-source APIs as follows.
Solution 2:[2]
A lot of progress has been made for sentiment analysis in non-English languages since you asked your question 6 years ago. Today, you have very good Hugging Face Transformer based models, fine-tuned for sentiment analysis in many languages. In my opinion, the best one for German is https://huggingface.co/oliverguhr/german-sentiment-bert
If you can't or don't want to run your own model, you can also use an API like this API I developed recently: NLP Cloud. I recently added the above German model for sentiment analysis.
Non-English NLP is still far from perfect. Most datasets are in English only but the ecosystem is gradually making progress.
Solution 3:[3]
Or as an alternative to classification, you could use a sentiment lexicon of German subjective terms. It would be beneficial to read this paper [1]. The advantage of using a lexicon based model is that it doesn't require any training.
Another way to do it is to try a hybrid model which involves feeding the terms in the lexicon as features for the classifier itself, along with some manually annotated training set.
Solution 4:[4]
There's also a dedicated German TextBlob: https://textblob-de.readthedocs.io/en/latest/ (under active development here):
Example:
from textblob_de import TextBlobDE as TextBlob
doc = "Es gibt kein richtiges Leben im falschen."
blob = TextBlob(doc)
blob.sentiment
# Sentiment(polarity=-1.0, subjectivity=0.0)
As of February 2022, there (still) is no subjectivity score available, and certain features don't work (such as .translate()). However, .noun_phrases or .tags work very well.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Neelisha SAXENA |
| Solution 2 | Julien Salinas |
| Solution 3 | modarwish |
| Solution 4 | MERose |
