Skip to content

Commit ccee4e5

Browse files
committed
Initial commit
0 parents  commit ccee4e5

27 files changed

+2126
-0
lines changed

LICENSE

Lines changed: 674 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
CRNN for Live Music Genre Recognition
2+
=====================================
3+
4+
Convolutional-Recurrent Neural Networks for Live Music Genre Recognition is a project aimed at creating a neural network recognizing music genre and providing a user-friendly visualization for the network's current belief of the genre of a song. The project was created for the 24-hour Braincode Hackathon in Warsaw by Piotr Kozakowski, Jakub Królak, Łukasz Margas and Bartosz Michalak.
5+
6+
This project uses Keras for the neural network and Tornado for serving requests.
7+
8+
9+
Demo
10+
----
11+
12+
You can see a demo for a few selected songs here: [Demo](http://deepsound.io/genres/).
13+
14+
15+
Usage
16+
-----
17+
18+
In a fresh virtualenv type:
19+
20+
```shell
21+
pip install -r requirements.txt
22+
```
23+
24+
to install all the prerequisites. Run:
25+
26+
```shell
27+
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python server.py
28+
```
29+
30+
to run the server at http://0.0.0.0:8080/
31+
32+
Then you can upload a song using the big (and only) button and see the results for yourself. All mp3 files should work fine.
33+
34+
Running server.py without additional parameters launches the server using a default model provided in the package. You can provide your own model, as long as it matches the input and output architecture of the provided model. You can train your own model by modifying and running train\_model.py. If you wish to train a model by yourself, download the [GTZAN dataset](http://opihi.cs.uvic.ca/sound/genres.tar.gz) (or provide analogous) to the data/ directory, extract it, run create\_data\_pickle.py to preprocess the data and then run train\_model.py to train the model:
35+
36+
```shell
37+
cd data
38+
wget http://opihi.cs.uvic.ca/sound/genres.tar.gz
39+
tar zxvf genres.tar.gz
40+
cd ..
41+
python create_data_pickle.py
42+
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python train_model.py
43+
```
44+
45+
You can "visualize" the filters learned by the convolutional layers using extract\_filters.py. This script for each convolutional neuron extracts and concatenates a few chunks resulting in maximum activation of this neuron from the tracks from the dataset. By default, it will put the visualizations in the filters/ directory. It requires the GTZAN dataset and its pickled version in the data/ directory. Run the commands above to obtain them. You can control the number of extracted chunks using the --count0 argument. Extracting higher number of chunks will be slower.
46+
47+
48+
Background
49+
----------
50+
51+
The rationale for this particular model is based on several works, primarily [Grzegorz Gwardys and Daniel Grzywczak, Deep Image Features in Music Information Retrieval](http://ijet.pl/index.php/ijet/article/view/10.2478-eletel-2014-0042/53) and [Recommending music on Spotify with Deep Learning](http://benanne.github.io/2014/08/05/spotify-cnns.html). The whole idea is extensively described in our blog post [Convolutional-Recurrent Neural Network for Live Music Genre Recognition](http://deepsound.io/music_genre_recognition.html).

common.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# To avoid errors during importing librosa.
2+
import matplotlib
3+
matplotlib.use('Agg')
4+
5+
import numpy as np
6+
import librosa as lbr
7+
import keras.backend as K
8+
9+
GENRES = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal',
10+
'pop', 'reggae', 'rock']
11+
WINDOW_SIZE = 2048
12+
WINDOW_STRIDE = WINDOW_SIZE // 2
13+
N_MELS = 128
14+
MEL_KWARGS = {
15+
'n_fft': WINDOW_SIZE,
16+
'hop_length': WINDOW_STRIDE,
17+
'n_mels': N_MELS
18+
}
19+
20+
def get_layer_output_function(model, layer_name):
21+
input = model.get_layer('input').input
22+
output = model.get_layer(layer_name).output
23+
f = K.function([input, K.learning_phase()], output)
24+
return lambda x: f([x, 0]) # learning_phase = 0 means test
25+
26+
def load_track(filename, enforce_shape=None):
27+
new_input, sample_rate = lbr.load(filename, mono=True)
28+
features = lbr.feature.melspectrogram(new_input, **MEL_KWARGS).T
29+
30+
if enforce_shape is not None:
31+
if features.shape[0] < enforce_shape[0]:
32+
delta_shape = (enforce_shape[0] - features.shape[0],
33+
enforce_shape[1])
34+
features = np.append(features, np.zeros(delta_shape), axis=0)
35+
elif features.shape[0] > enforce_shape[0]:
36+
features = features[: enforce_shape[0], :]
37+
38+
features[features == 0] = 1e-6
39+
return (np.log(features), float(new_input.shape[0]) / sample_rate)

create_data_pickle.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
from common import load_track, GENRES
2+
import sys
3+
import numpy as np
4+
from math import pi
5+
from cPickle import dump
6+
import os
7+
from optparse import OptionParser
8+
9+
TRACK_COUNT = 1000
10+
11+
def get_default_shape(dataset_path):
12+
tmp_features, _ = load_track(os.path.join(dataset_path,
13+
'blues/blues.00000.au'))
14+
return tmp_features.shape
15+
16+
def collect_data(dataset_path):
17+
'''
18+
Collects data from the GTZAN dataset into a pickle. Computes a Mel-scaled
19+
power spectrogram for each track.
20+
21+
:param dataset_path: path to the GTZAN dataset directory
22+
:returns: triple (x, y, track_paths) where x is a matrix containing
23+
extracted features, y is a one-hot matrix of genre labels and
24+
track_paths is a dict of absolute track paths indexed by row indices in
25+
the x and y matrices
26+
'''
27+
default_shape = get_default_shape(dataset_path)
28+
x = np.zeros((TRACK_COUNT,) + default_shape, dtype=np.float32)
29+
y = np.zeros((TRACK_COUNT, len(GENRES)), dtype=np.float32)
30+
track_paths = {}
31+
32+
for (genre_index, genre_name) in enumerate(GENRES):
33+
for i in xrange(TRACK_COUNT // len(GENRES)):
34+
file_name = '{}/{}.000{}.au'.format(genre_name,
35+
genre_name, str(i).zfill(2))
36+
print 'Processing', file_name
37+
path = os.path.join(dataset_path, file_name)
38+
track_index = genre_index * (TRACK_COUNT // len(GENRES)) + i
39+
x[track_index], _ = load_track(path, default_shape)
40+
y[track_index, genre_index] = 1
41+
track_paths[track_index] = os.path.abspath(path)
42+
43+
return (x, y, track_paths)
44+
45+
if __name__ == '__main__':
46+
parser = OptionParser()
47+
parser.add_option('-d', '--dataset_path', dest='dataset_path',
48+
default=os.path.join(os.path.dirname(__file__), 'data/genres'),
49+
help='path to the GTZAN dataset directory', metavar='DATASET_PATH')
50+
parser.add_option('-o', '--output_pkl_path', dest='output_pkl_path',
51+
default=os.path.join(os.path.dirname(__file__), 'data/data.pkl'),
52+
help='path to the output pickle', metavar='OUTPUT_PKL_PATH')
53+
options, args = parser.parse_args()
54+
55+
(x, y, track_paths) = collect_data(options.dataset_path)
56+
57+
data = {'x': x, 'y': y, 'track_paths': track_paths}
58+
with open(options.output_pkl_path, 'w') as f:
59+
dump(data, f)

data/README

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
This directory should contain the GTZAN dataset in a subdirectory "genres".
2+
Read the main README to find out how to download it.

extract_filters.py

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
from common import get_layer_output_function, WINDOW_SIZE, WINDOW_STRIDE
2+
from keras.models import model_from_yaml
3+
import librosa as lbr
4+
import numpy as np
5+
from functools import partial
6+
from optparse import OptionParser
7+
import cPickle
8+
import os
9+
10+
def compose(f, g):
11+
return lambda x: f(g(x))
12+
13+
def undo_layer(length, stride, (i, j)):
14+
return (stride * i, stride * (j - 1) + length)
15+
16+
def extract_filters(model, data, filters_path, count0):
17+
x = data['x']
18+
track_paths = data['track_paths']
19+
20+
conv_layer_names = []
21+
i = 1
22+
while True:
23+
name = 'convolution_' + str(i)
24+
if model.get_layer(name) is None:
25+
break
26+
conv_layer_names.append(name)
27+
i += 1
28+
29+
# Generate undoers for every convolutional layer. Undoer is a function
30+
# translating a pair of coordinates in feature space (mel spectrograms or
31+
# features extracted by convolutional layers) to the sample space (raw
32+
# audio signal).
33+
conv_layer_undoers = []
34+
35+
# undo the mel spectrogram extraction
36+
undoer = partial(undo_layer, WINDOW_SIZE, WINDOW_STRIDE)
37+
38+
for name in conv_layer_names:
39+
layer = model.get_layer(name)
40+
length = layer.filter_length
41+
stride = layer.subsample_length
42+
43+
# undo the convolution layer
44+
undoer = compose(partial(undo_layer, length, stride), undoer)
45+
conv_layer_undoers.append(undoer)
46+
47+
# undo the pooling layer
48+
undoer = compose(partial(undo_layer, 2, 2), undoer)
49+
conv_layer_output_funs = \
50+
map(partial(get_layer_output_function, model), conv_layer_names)
51+
52+
# Extract track chunks with highest activations for each filter in each
53+
# convolutional layer.
54+
for (layer_index, output_fun) in enumerate(conv_layer_output_funs):
55+
layer_path = os.path.join(filters_path, conv_layer_names[layer_index])
56+
if not os.path.exists(layer_path):
57+
os.makedirs(layer_path)
58+
59+
print 'Computing outputs for layer', conv_layer_names[layer_index]
60+
output = output_fun(x)
61+
62+
# matrices of shape n_tracks x time x n_filters
63+
max_over_time = np.amax(output, axis=1)
64+
argmax_over_time = np.argmax(output, axis=1)
65+
66+
# number of input chunks to extract for each filter
67+
count = count0 // 2 ** layer_index
68+
argmax_over_track = \
69+
np.argpartition(max_over_time, -count, axis=0)[-count :, :]
70+
71+
undoer = conv_layer_undoers[layer_index]
72+
73+
for filter_index in xrange(argmax_over_track.shape[1]):
74+
print 'Processing layer', conv_layer_names[layer_index], \
75+
'filter', filter_index
76+
77+
track_indices = argmax_over_track[:, filter_index]
78+
time_indices = argmax_over_time[track_indices, filter_index]
79+
sample_rate = [None]
80+
81+
def extract_sample_from_track(undoer, (track_index, time_index)):
82+
track_path = track_paths[track_index]
83+
(track_samples, sample_rate[0]) = lbr.load(track_path,
84+
mono=True)
85+
(t1, t2) = undoer((time_index, time_index + 1))
86+
return track_samples[t1 : t2]
87+
88+
samples_for_filter = np.concatenate(
89+
map(partial(extract_sample_from_track, undoer),
90+
zip(track_indices, time_indices)))
91+
92+
filter_path = os.path.join(layer_path,
93+
'{}.wav'.format(filter_index))
94+
lbr.output.write_wav(filter_path, samples_for_filter,
95+
sample_rate[0])
96+
97+
if __name__ == '__main__':
98+
parser = OptionParser()
99+
parser.add_option('-m', '--model_path', dest='model_path',
100+
default=os.path.join(os.path.dirname(__file__),
101+
'models/model.yaml'),
102+
help='path to the model YAML file', metavar='MODEL_PATH')
103+
parser.add_option('-w', '--weights_path', dest='weights_path',
104+
default=os.path.join(os.path.dirname(__file__),
105+
'models/weights.h5'),
106+
help='path to the model weights hdf5 file',
107+
metavar='WEIGHTS_PATH')
108+
parser.add_option('-d', '--data_path', dest='data_path',
109+
default=os.path.join(os.path.dirname(__file__),
110+
'data/data.pkl'),
111+
help='path to the data pickle',
112+
metavar='DATA_PATH')
113+
parser.add_option('-f', '--filters_path', dest='filters_path',
114+
default=os.path.join(os.path.dirname(__file__),
115+
'filters'),
116+
help='path to the output filters directory',
117+
metavar='FILTERS_PATH')
118+
parser.add_option('-c', '--count0', dest='count0',
119+
default='4',
120+
help=('number of chunks to extract from the first convolutional ' +
121+
'layer, this number is halved for each next layer'),
122+
metavar='COUNT0')
123+
options, args = parser.parse_args()
124+
125+
with open(options.model_path, 'r') as f:
126+
model = model_from_yaml(f.read())
127+
model.load_weights(options.weights_path)
128+
129+
with open(options.data_path, 'r') as f:
130+
data = cPickle.load(f)
131+
132+
extract_filters(model, data, options.filters_path, int(options.count0))

genre_recognizer.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
from common import load_track, get_layer_output_function
2+
import numpy as np
3+
from keras.layers import Input
4+
from keras.models import model_from_yaml, Model
5+
from keras import backend as K
6+
7+
class GenreRecognizer():
8+
9+
def __init__(self, model_path, weights_path):
10+
with open(model_path, 'r') as f:
11+
model = model_from_yaml(f.read())
12+
model.load_weights(weights_path)
13+
self.pred_fun = get_layer_output_function(model, 'output_realtime')
14+
print 'Loaded model.'
15+
16+
def recognize(self, track_path):
17+
print 'Loading song', track_path
18+
(features, duration) = load_track(track_path)
19+
features = np.reshape(features, (1,) + features.shape)
20+
return (self.pred_fun(features), duration)

0 commit comments

Comments
 (0)