Data Analysis Project
The Problem
The data analysis project involves the design of a complete machine learning solution. In particular, the project revolves around the task of identifying the music genre of songs. This is useful as a way to group music into categories that can be later used for recommendation or discovery. The problem of music genre classification is difficult: while some genres distinctions are fairly straightforward (e.g. heavy metal vs classical), others are fuzzier (e.g. rock vs blues).
In this data analysis project, you should try out different machine learning methods, including but not limited to those presented throughout this course, for predicting the music genre of songs. The dataset which is provided to solve this task contains preprocessed audio information. In particular, the raw audio signals have been transformed to carefully chosen features.
The data analysis project will be hosted on http://www.kaggle.com, which is a repository for data analysis contests. In particular, you can access the dataset via this website and also submit preliminary solutions. The solutions you submit will be automatically evaluated and compared against other solutions.
After carrying out the challenge, you have to write a report about your particular solution. The report will then be peer-reviewed among the students.
The Data
The data is split into two data sets: a training dataset with 4300 songs, and a test dataset with 6544 songs. Each song is represented by 264 features and belongs to one of 10 possible genres, which are
'1-Pop_Rock'; '2-Electronic'; '3-Rap'; '4-Jazz'; '5-Latin'; '6-RnB'; '7-International'; '8-Country'; '9-Reggae' and '10-Blues'Each song is represented by 264 feature values, which correspond to three main aspects of a music signal, i.e, timbre, pitch (melody and harmony) and rhythm. Roughly speaking, these feature values encode the overall geometry of the sound energy distribution over time and frequency, i.e., which tones (frequencies) occur when (time).
Extra Reward
The 5 best performing teams (as marked in the Kaggle leaderboards) will be substantially rewarded with bonus points. To determine this, we will take a snapshot of the leaderboards on 26th of November at 22:00 (this is not the deadline for the whole project, just for the extra reward). The Kaggle competitions will stay open after this date, so teams will be able to continue submitting their solutions, but these scores will not count towards the reward. The selected teams will be invited to give a short presentation discussing their approach during a workshop to be organized before the end of the course. The bonus points will only be valid if at least one member of the team participates on the workshop.