'How to approach this NLP algorithm solution? [closed]
I have an algorithm I want to make, just not sure how to approach this concept. I have a bag of words taken from some amazon reviews (my data set) with a rating on said reviews of the product(0-5 star). My goal is to make an algorithm using these words to give a rating on reviews without one. How exactly could I approach this problem?
The solution I thought of is to first map out the words to the reviews and based on the rating of said review give the word that rating as a score then repeat this process for all the reviews with ratings then average out the score of the words with how many ratings had the words in it (I still need to figure out what to do when the word appears twice in the review). Then finally using the score i'll use the model on the reviews without ratings then average out that score based on how many words from the bag were used.
something like this:
review_one = {'food is nice': 4}
review_two = {'its alright': 3}
review_three = {'bad' : 1}
review_four = {'its nice': 3}
bag_of_words = ['bad', 'nice', 'alright', 'food']
trained_model = {'bad': 1, 'nice': 3.5, 'alright': 3, 'food': 4}
test_review = "service bad but food nice"
trained_model.predict(test_review)
output = test_review_rating = 2.83 #(1+3.5+4)/3
My solutions seems way too tedious so I wanted to if there is a better way to approach this or am I doing something completely different?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
