LastFM

This project was carried out as part of the Machine Learning for Behavioural Data course at EPFL in the spring 2020 semester. The aim of the project was to use the LastFM dataset containing information on users' listening habits, as well as demographic information, to recommend new artists for users.

Using the code

In this project, we use the TensorFlow library to construct, fit, and evaluate models, and matplotlib to visualize results.

Data

The original data contains two datasets:

usersha1-profile containing demographic data, with columns user_email, gender, age, country, signup
usersha1-artmbid-artname-plays containing behavioural data, with columns user_email, artist_id, artist_name, plays

The cleaned and preprocessed data can be generated by following the instructions in the preprocessing.ipynb notebook. In order to upload the original dataset, the cells with the title "Uploading the original dataset" must be uncommented. It is also possible to find the prepared data at the following link: https://drive.google.com/drive/folders/14izEunqUyASA-fkS_EqrBcWMpHI1pXxQ

Preprocessing:

The following transformations are performed on the data to prepare it for use:

dataset is reduced to information about users in USA
signup column is converted into datetime type
user IDs are converted to numbers
unrealistic ages are changed to NaN
gender, country, and age are one-hot encoded
features year, month, weekday and day are extracted from sign-up date
samples with missing artist_name are dropped
samples with missing artist_id are dropped
implicit feedback is transformed to explicit (plays to ratings)

The processed datasets are saved in the folder lastfm-dataset-360K with file names behav-360k-processed.csv and demo-360k-processed.csv

Finally, the data is split into train, validation, and test datasets, with 10% of the whole set being used for testing, and 10% of the training set used for validation. The function train_test_split from the train_test.py module is used to split the data.

Training

The following models and settings are implemented and evaluated:

Baseline Model, with and without cold start
User-User Neighborhood Model
Latent Factor Model, with different proportions of negative samples
Neural Matrix Factorization Model, with and without cold start, and with different proportions of negative samples

For training, labels are converted to 0 if there is no interaction between the given user-artist pair, and 1 otherwise. The ratings calculated during preprocessing are used as weights for thr Weighted Binary Cross Entropy when training LFM and NeuMF.

The following notebooks should be run to train and evaluate the models:

baseline_model.ipynb
user_neighborhood_model.ipynb
neural_matrix_factorization_model.ipynb
latent_factor_model.ipynb

Each notebook corresponds to a model, and prepares data and trains the model in all necessary settings (different number of negative samples, cold start, etc).

Environment

The project has been developed with python 3.7.10. The full requirements are stated in the requirements.txt file. All necessary imports are done at the beginning of each notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LastFM

Using the code

Data

Preprocessing:

Training

Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
src		src
.gitignore		.gitignore
README.md		README.md
baseline_model.ipynb		baseline_model.ipynb
latent_factor_model.ipynb		latent_factor_model.ipynb
neural_matrix_factorization_model.ipynb		neural_matrix_factorization_model.ipynb
preprocessing.ipynb		preprocessing.ipynb
requirements.txt		requirements.txt
user_neighborhood_model.ipynb		user_neighborhood_model.ipynb

Folders and files

Latest commit

History

Repository files navigation

LastFM

Using the code

Data

Preprocessing:

Training

Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages