'GloVe word embeddings containing sentiment?

I've been researching sentiment analysis with word embeddings. I read papers that state that word embeddings ignore sentiment information of the words in the text. One paper states that among the top 10 words that are semantically similar, around 30 percent of words have opposite polarity e.g. happy - sad.

So, I computed word embeddings on my dataset (Amazon reviews) with the GloVe algorithm in R. Then, I looked at the most similar words with cosine similarity and I found that actually every word is sentimentally similar. (E.g. beautiful - lovely - gorgeous - pretty - nice - love). Therefore, I was wondering how this is possible since I expected the opposite from reading several papers. What could be the reason for my findings?

Two of the many papers I read:

  • Yu, L. C., Wang, J., Lai, K. R. & Zhang, X. (2017). Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 671-681.
  • Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. & Qin, B. (2014). Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1: Long Papers, 1555-1565.


Solution 1:[1]

Assumption: When you say you computed GLoVe embeddings you mean you used pretrained GLoVe.

Static Word Embeddings does not carry sentiment information of the input text at runtime

Above statement means that word embedding algorithms(most of them in my knowledge, like GLoVe, Word2Vec) are not designed or formulated to capture sentiment of the word. But, in general word embedding algorithms map the words that are similar in meaning (based on statistical nearness and co-occurrences). Example, "Woman" and "Girl" will lie near to each other in the n-dimensional space of the embeddings. But that does not mean that any sentiment related information is captured here.

Hence, Words : (beautiful - lovely - gorgeous - pretty - nice - love), being sentimentally similar to a given word is not a co-incident. We have to look these words in terms of their meaning, all these words are similar in meaning, but we cannot say that, they necessarily carry the same sentiments. These words lie near to each other in GLoVe's vector space, because the model was trained well on the corpus that carried sufficient information in terms of words that can be grouped similar. Also, please study the similarity score, that will make it clearer.

The top 10 words that are semantically similar, around 30 percent of words have opposite polarity

Here, asemanticity, is lesser related to context, whereas sentiment is more related to context. One word cannot define sentiment.

Example:

Jack: "Your dress is beautiful, Gloria"! Gloria: "Beautiful my foot!"

In both the sentences, beautiful carries completely different sentiment, where as for both of them will have same embedding for the word beautiful. Now, replace beautiful with (lovely - gorgeous - pretty - nice), semantic thing holds true as described in one of the papers. Also, sentiment is not captured by Word Embeddings, hence, other paper also stands true.

The point where confusion may have occurred is considering two or more word with similar meanings to be sentimentally similar. Sentiment information can be gathered at sentence level or doc level and not at word level.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1