{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Transfer Learning\n", "\n", "A Convolutional Neural Network (CNN) for image classification is made up of multiple layers that extract features, such as edges, corners, etc; and then use a final fully-connected layer to classify objects based on these features. You can visualize this like this:\n", "\n", "\n", " \n", " \n", "
Convolutional LayerPooling LayerConvolutional LayerPooling LayerFully Connected Layer
Feature ExtractionClassification
\n", "\n", "*Transfer Learning* is a technique where you can take an existing trained model and re-use its feature extraction layers, replacing its final classification layer with a fully-connected layer trained on your own custom images. With this technique, your model benefits from the feature extraction training that was performed on the base model (which may have been based on a larger training dataset than you have access to) to build a classification model for your own specific set of object classes.\n", "\n", "How does this help? Well, think of it this way. Suppose you take a professional tennis player and a complete beginner, and try to teach them both how to play raquetball. It's reasonable to assume that the professional tennis player will be easier to train, because many of the underlying skills involved in raquetball are already learned. Similarly, a pre-trained CNN model may be easier to train to classify specific set of objects because it's already learned how to identify the features of common objects, such as edges and corners. Fundamentally, a pre-trained model can be a great way to produce an effective classifier even when you have limited data with which to train it.\n", "\n", "In this notebook, we'll see how to implement transfer learning for a classification model using TensorFlow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import TensorFlow libraries\n", "\n", "Let's start by ensuring that we have the latest version of the **TensorFlow** package installed and importing the Tensorflow libraries we're going to use." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --upgrade tensorflow" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import tensorflow\n", "from tensorflow import keras\n", "print('TensorFlow version:',tensorflow.__version__)\n", "print('Keras version:',keras.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare the base model\n", "\n", "To use transfer learning, we need a base model from which we can use the trained feature extraction layers. The ***resnet*** model is an CNN-based image classifier that has been pre-trained using a huge dataset of 3-color channel images of 224x224 pixels. Let's create an instance of it with some pretrained weights, excluding its final (top) prediction layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "outputPrepend" ] }, "outputs": [], "source": [ "base_model = keras.applications.resnet.ResNet50(weights='imagenet', include_top=False, input_shape=(224,224,3))\n", "print(base_model.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare the image data\n", "\n", "The pretrained model has many layers, starting with a convolutional layer that starts the feature extraction process from image data.\n", "\n", "For feature extraction to work with our own images, we need to ensure that the image data we use to train our prediction layer has the same number of features (pixel values) as the images originally used to train the feature extraction layers, so we need data loaders for color images that are 224x224 pixels in size.\n", "\n", "Tensorflow includes functions for loading and transforming data. We'll use these to create a generator for training data, and a second generator for test data (which we'll use to validate the trained model). The loaders will transform the image data to match the format used to train the original resnet CNN model and normalize them.\n", "\n", "Run the following cell to define the data generators and list the classes for our images." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n", "\n", "data_folder = 'data/shapes'\n", "pretrained_size = (224,224)\n", "batch_size = 30\n", "\n", "print(\"Getting Data...\")\n", "datagen = ImageDataGenerator(rescale=1./255, # normalize pixel values\n", " validation_split=0.3) # hold back 30% of the images for validation\n", "\n", "print(\"Preparing training dataset...\")\n", "train_generator = datagen.flow_from_directory(\n", " data_folder,\n", " target_size=pretrained_size, # resize to match model expected input\n", " batch_size=batch_size,\n", " class_mode='categorical',\n", " subset='training') # set as training data\n", "\n", "print(\"Preparing validation dataset...\")\n", "validation_generator = datagen.flow_from_directory(\n", " data_folder,\n", " target_size=pretrained_size, # resize to match model expected input\n", " batch_size=batch_size,\n", " class_mode='categorical',\n", " subset='validation') # set as validation data\n", "\n", "classnames = list(train_generator.class_indices.keys())\n", "print(\"class names: \", classnames)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a prediction layer\n", "\n", "We downloaded the complete *resnet* model excluding its final prediction layer, so need to combine these layers with a fully-connected (*dense*) layer that takes the flattened outputs from the feature extraction layers and generates a prediction for each of our image classes.\n", "\n", "We also need to freeze the feature extraction layers to retain the trained weights. Then when we train the model using our images, only the final prediction layer will learn new weight and bias values - the pre-trained weights already learned for feature extraction will remain the same." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "outputPrepend" ] }, "outputs": [], "source": [ "from tensorflow.keras import applications\n", "from tensorflow.keras import Model\n", "from tensorflow.keras.layers import Flatten, Dense\n", "\n", "# Freeze the already-trained layers in the base model\n", "for layer in base_model.layers:\n", " layer.trainable = False\n", "\n", "# Create prediction layer for classification of our images\n", "x = base_model.output\n", "x = Flatten()(x)\n", "prediction_layer = Dense(len(classnames), activation='softmax')(x) \n", "model = Model(inputs=base_model.input, outputs=prediction_layer)\n", "\n", "# Compile the model\n", "model.compile(loss='categorical_crossentropy',\n", " optimizer='adam',\n", " metrics=['accuracy'])\n", "\n", "# Now print the full model, which will include the layers of the base model plus the dense layer we added\n", "print(model.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train the Model\n", "\n", "With the layers of the CNN defined, we're ready to train it using our image data. The weights used in the feature extraction layers from the base resnet model will not be changed by training, only the final dense layer that maps the features to our shape classes will be trained." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false, "tags": [] }, "outputs": [], "source": [ "# Train the model over 3 epochs\n", "num_epochs = 3\n", "history = model.fit(\n", " train_generator,\n", " steps_per_epoch = train_generator.samples // batch_size,\n", " validation_data = validation_generator, \n", " validation_steps = validation_generator.samples // batch_size,\n", " epochs = num_epochs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## View the loss history\n", "\n", "We tracked average training and validation loss for each epoch. We can plot these to verify that the loss reduced over the training process and to detect *over-fitting* (which is indicated by a continued drop in training loss after validation loss has levelled out or started to increase)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from matplotlib import pyplot as plt\n", "\n", "epoch_nums = range(1,num_epochs+1)\n", "training_loss = history.history[\"loss\"]\n", "validation_loss = history.history[\"val_loss\"]\n", "plt.plot(epoch_nums, training_loss)\n", "plt.plot(epoch_nums, validation_loss)\n", "plt.xlabel('epoch')\n", "plt.ylabel('loss')\n", "plt.legend(['training', 'validation'], loc='upper right')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate model performance\n", "\n", "We can see the final accuracy based on the test data, but typically we'll want to explore performance metrics in a little more depth. Let's plot a confusion matrix to see how well the model is predicting each class." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Tensorflow doesn't have a built-in confusion matrix metric, so we'll use SciKit-Learn\n", "import numpy as np\n", "from sklearn.metrics import confusion_matrix\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "print(\"Generating predictions from validation data...\")\n", "# Get the image and label arrays for the first batch of validation data\n", "x_test = validation_generator[0][0]\n", "y_test = validation_generator[0][1]\n", "\n", "# Use the model to predict the class\n", "class_probabilities = model.predict(x_test)\n", "\n", "# The model returns a probability value for each class\n", "# The one with the highest probability is the predicted class\n", "predictions = np.argmax(class_probabilities, axis=1)\n", "\n", "# The actual labels are hot encoded (e.g. [0 1 0], so get the one with the value 1\n", "true_labels = np.argmax(y_test, axis=1)\n", "\n", "# Plot the confusion matrix\n", "cm = confusion_matrix(true_labels, predictions)\n", "plt.imshow(cm, interpolation=\"nearest\", cmap=plt.cm.Blues)\n", "plt.colorbar()\n", "tick_marks = np.arange(len(classnames))\n", "plt.xticks(tick_marks, classnames, rotation=85)\n", "plt.yticks(tick_marks, classnames)\n", "plt.xlabel(\"Predicted Shape\")\n", "plt.ylabel(\"Actual Shape\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use the trained model\n", "\n", "Now that we've trained the model, we can use it to predict the class of an image." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from tensorflow.keras import models\n", "import numpy as np\n", "from random import randint\n", "import os\n", "%matplotlib inline\n", "\n", "# Function to predict the class of an image\n", "def predict_image(classifier, image):\n", " from tensorflow import convert_to_tensor\n", " # The model expects a batch of images as input, so we'll create an array of 1 image\n", " imgfeatures = img.reshape(1, img.shape[0], img.shape[1], img.shape[2])\n", "\n", " # We need to format the input to match the training data\n", " # The generator loaded the values as floating point numbers\n", " # and normalized the pixel values, so...\n", " imgfeatures = imgfeatures.astype('float32')\n", " imgfeatures /= 255\n", " \n", " # Use the model to predict the image class\n", " class_probabilities = classifier.predict(imgfeatures)\n", " \n", " # Find the class predictions with the highest predicted probability\n", " index = int(np.argmax(class_probabilities, axis=1)[0])\n", " return index\n", "\n", "# Function to create a random image (of a square, circle, or triangle)\n", "def create_image (size, shape):\n", " from random import randint\n", " import numpy as np\n", " from PIL import Image, ImageDraw\n", " \n", " xy1 = randint(10,40)\n", " xy2 = randint(60,100)\n", " col = (randint(0,200), randint(0,200), randint(0,200))\n", "\n", " img = Image.new(\"RGB\", size, (255, 255, 255))\n", " draw = ImageDraw.Draw(img)\n", " \n", " if shape == 'circle':\n", " draw.ellipse([(xy1,xy1), (xy2,xy2)], fill=col)\n", " elif shape == 'triangle':\n", " draw.polygon([(xy1,xy1), (xy2,xy2), (xy2,xy1)], fill=col)\n", " else: # square\n", " draw.rectangle([(xy1,xy1), (xy2,xy2)], fill=col)\n", " del draw\n", " \n", " return np.array(img)\n", "\n", "# Create a random test image\n", "classnames = os.listdir(os.path.join('data', 'shapes'))\n", "classnames.sort()\n", "img = create_image ((224,224), classnames[randint(0, len(classnames)-1)])\n", "plt.axis('off')\n", "plt.imshow(img)\n", "\n", "# Use the classifier to predict the class\n", "class_idx = predict_image(model, img)\n", "print (classnames[class_idx])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learn More\n", "\n", "* [Tensorflow Documentation](https://www.tensorflow.org/tutorials/images/transfer_learning)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.6 - AzureML", "language": "python", "name": "python3-azureml" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }