SC1015-Mini-Project

This is a data analysis project that aims to investigate the data science behind anime scores and the variables that may affect them. The project involves analyzing a dataset of anime shows, which includes information such as title, genre, episode count, release year, source material, licensors, producers, and user scores.
The project is implemented in Python using the popular data analysis libraries Pandas and Matplotlib. The dataset is cleaned and preprocessed to handle missing values, data types, and inconsistencies. Various data analysis techniques are applied to gain insights into the relationship between anime scores and different variables, such as themes and genres, demographics, release year, etc.

Our Problem Statement

We want to predict viewers' liking of a new anime based on various factors such as producing company, genres, age ratings and number of episodes, etc.

Our Motivation

With the wide variety of anime available, it may overwhelm viewers when choosing what show to watch. As such, our analysis helps to predict and improve viewer satisfaction.
In addition, an average 13-episode anime can cost up to US$2 million to produce. Stakeholders in the anime industry may be motivated to have a data-driven approach to decision-making when it comes to the various factors which may affect viewership and market anime to specific audiences hence enhance viewership ratings and maximise profit, reducing risk of financial losses.

Dataset used

https://www.kaggle.com/datasets/harits/anime-database-2022

Main objectives

Understand the distribution of anime scores and identify any trends or patterns Explore the relationship between anime scores and different variables Visualize the findings using bar charts and other graphical representations

The project is organized in the following manner:

Data Cleaning: This section includes the cleaning and preprocessing of the raw dataset, handling missing values, data types, and inconsistencies.

Data Analysis: This section includes the data analysis techniques applied to gain insights into the relationship between anime scores and different variables, such as genre, episode count, release year, and licensors. This includes the results of the data analysis and the visualizations created to present the findings in an easy-to-understand manner.

Machine Learning: This section includes machine learning models we have used on our dataset, Multivariate Regression and Random Forest Classifier

Our Analysis

We analyse the

Type of anime
Source of anime
Themes_Genres of anime
Studios of anime
Demographics of anime
Producers of anime
Licensors of anime
Rating of anime

and how they may affect the Score of anime in Exploratory Data Analysis.

Machine Learning

Multivariate Linear Regression
Random Forest Classifier
K-Means Clustering

What we learned

Different Machine Learning (ML) techniques

Random Forest Classifier
Multivariate Linear Regression Encoders
One Hot Encoder
Label Encoder
Ordinal Encoder
Nominal Encoder

Data Imputation

most-frequent (for categorical)
mean (for numerical)

Our Contributions

Data Cleaning and Preparation: Andrea
EDA and Visualisation: Andrea, Hain Eu
Multivariate Regression: Chi, Hain Eu
Random Forest Classification: Andrea
K-Means Clustering: Chi, Hain Eu
Presentation Slides Deck: Hain Eu, Chi, Andrea
Presentation Script: Hain Eu, Chi, Andrea
Presentation Voiceover: Hain Eu
Github ReadMe: Andrea

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Anime.csv		Anime.csv
Data Cleaning & Processing + EDA.ipynb		Data Cleaning & Processing + EDA.ipynb
README.md		README.md
Random Forest Classifier.ipynb		Random Forest Classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SC1015-Mini-Project

Our Problem Statement

Our Motivation

Dataset used

Main objectives

The project is organized in the following manner:

Our Analysis

Machine Learning

What we learned

Our Contributions

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SC1015-Mini-Project

Our Problem Statement

Our Motivation

Dataset used

Main objectives

The project is organized in the following manner:

Our Analysis

Machine Learning

What we learned

Our Contributions

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages