Skip to content

PixelPusher42/SC1015-Mini-Project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SC1015-Mini-Project

  • This is a data analysis project that aims to investigate the data science behind anime scores and the variables that may affect them. The project involves analyzing a dataset of anime shows, which includes information such as title, genre, episode count, release year, source material, licensors, producers, and user scores.
  • The project is implemented in Python using the popular data analysis libraries Pandas and Matplotlib. The dataset is cleaned and preprocessed to handle missing values, data types, and inconsistencies. Various data analysis techniques are applied to gain insights into the relationship between anime scores and different variables, such as themes and genres, demographics, release year, etc.

Our Problem Statement

We want to predict viewers' liking of a new anime based on various factors such as producing company, genres, age ratings and number of episodes, etc.

Our Motivation

  • With the wide variety of anime available, it may overwhelm viewers when choosing what show to watch. As such, our analysis helps to predict and improve viewer satisfaction.
  • In addition, an average 13-episode anime can cost up to US$2 million to produce. Stakeholders in the anime industry may be motivated to have a data-driven approach to decision-making when it comes to the various factors which may affect viewership and market anime to specific audiences hence enhance viewership ratings and maximise profit, reducing risk of financial losses.

Dataset used

dataset-cover https://www.kaggle.com/datasets/harits/anime-database-2022

Main objectives

Understand the distribution of anime scores and identify any trends or patterns Explore the relationship between anime scores and different variables Visualize the findings using bar charts and other graphical representations

The project is organized in the following manner:

Data Cleaning: This section includes the cleaning and preprocessing of the raw dataset, handling missing values, data types, and inconsistencies.

Data Analysis: This section includes the data analysis techniques applied to gain insights into the relationship between anime scores and different variables, such as genre, episode count, release year, and licensors. This includes the results of the data analysis and the visualizations created to present the findings in an easy-to-understand manner.

Machine Learning: This section includes machine learning models we have used on our dataset, Multivariate Regression and Random Forest Classifier

Our Analysis

We analyse the

  1. Type of anime
  2. Source of anime
  3. Themes_Genres of anime
  4. Studios of anime
  5. Demographics of anime
  6. Producers of anime
  7. Licensors of anime
  8. Rating of anime

and how they may affect the Score of anime in Exploratory Data Analysis.

Machine Learning

  1. Multivariate Linear Regression
  2. Random Forest Classifier
  3. K-Means Clustering

What we learned

Different Machine Learning (ML) techniques

  • Random Forest Classifier
  • Multivariate Linear Regression Encoders
  • One Hot Encoder
  • Label Encoder
  • Ordinal Encoder
  • Nominal Encoder

Data Imputation

  • most-frequent (for categorical)
  • mean (for numerical)

Our Contributions

  1. Data Cleaning and Preparation: Andrea
  2. EDA and Visualisation: Andrea, Hain Eu
  3. Multivariate Regression: Chi, Hain Eu
  4. Random Forest Classification: Andrea
  5. K-Means Clustering: Chi, Hain Eu
  6. Presentation Slides Deck: Hain Eu, Chi, Andrea
  7. Presentation Script: Hain Eu, Chi, Andrea
  8. Presentation Voiceover: Hain Eu
  9. Github ReadMe: Andrea

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 100.0%