Rescoring_Yelp_Stars_with_NLP_Models

An initial effort to use NLP methods on user reviews to normalize star score bias. There are many misaligned reviews and star scores, along with easy five star scores. I wanted to test Classification models to give a better star distribution. I used the Yelp Academic Dataset found here - https://www.yelp.com/dataset

Examples of misaligned reviews:

Also, in general star scores skew towards 5 stars way too often.

If we look more categorically,

After importing, merging, and sampling the datasets to a manageable size; I filtered for food based business reviews.

I then engineered a feature based on user avg scores vs if they scored the current business higher than the business average score. This binary feature is called 'fan' - and was my target variable in my models.
I put the reviews through tfidf vectorization after normalizing with a tokeninzer and lemmatizer.
I then ran a Random Forest Classifier and a 2 input,1 ouput LSTM / Sequential neural net.

The Random Forest gave better results at a baseline and is much less resource intense.

With the results, I built a function to normalize the star scores based on reviews. See the new distribution.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Final_CSTONE_Research.ipynb		Final_CSTONE_Research.ipynb
Final_Capstone_Proposal.docx		Final_Capstone_Proposal.docx
LICENSE		LICENSE
README.md		README.md
YELP NLP Slides.pptx		YELP NLP Slides.pptx
Yelp_NLP_modeling.ipynb		Yelp_NLP_modeling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rescoring_Yelp_Stars_with_NLP_Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Rescoring_Yelp_Stars_with_NLP_Models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages