{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convolutional Neural Networks with PyTorch\n", "\n", "\"Deep Learning\" is a general term that usually refers to the use of neural networks with multiple layers that synthesize the way the human brain learns and makes decisions. A convolutional neural network is a kind of neural network that extracts *features* from matrices of numeric values (often images) by convolving multiple filters over the matrix values to apply weights and identify patterns, such as edges, corners, and so on in an image. The numeric representations of these patterns are then passed to a fully-connected neural network layer to map the features to specific classes.\n", "\n", "There are several commonly used frameworks for creating CNNs. In this notebook, we'll build a simple example CNN using PyTorch.\n", "\n", "## Import libraries\n", "\n", "First, let's install and import the PyTorch libraries we'll need." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false, "tags": [] }, "outputs": [], "source": [ "# Import PyTorch libraries\n", "import torch\n", "import torchvision\n", "import torchvision.transforms as transforms\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "from torch.autograd import Variable\n", "import torch.nn.functional as F\n", "\n", "# Other libraries we'll use\n", "import numpy as np\n", "import os\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "%matplotlib inline\n", "\n", "print(\"Libraries imported - ready to use PyTorch\", torch.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore the data\n", "\n", "In this exercise, you'll train a CNN-based classification model that can classify images of geometric shapes. Let's take a look at the classes of shape the model needs to identify." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The images are in the data/shapes folder\n", "data_path = 'data/shapes/'\n", "\n", "# Get the class names\n", "classes = os.listdir(data_path)\n", "classes.sort()\n", "print(len(classes), 'classes:')\n", "print(classes)\n", "\n", "# Show the first image in each folder\n", "fig = plt.figure(figsize=(8, 12))\n", "i = 0\n", "for sub_dir in os.listdir(data_path):\n", " i+=1\n", " img_file = os.listdir(os.path.join(data_path,sub_dir))[0]\n", " img_path = os.path.join(data_path, sub_dir, img_file)\n", " img = mpimg.imread(img_path)\n", " a=fig.add_subplot(1, len(classes),i)\n", " a.axis('off')\n", " imgplot = plt.imshow(img)\n", " a.set_title(img_file)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data\n", "\n", "PyTorch includes functions for loading and transforming data. We'll use these to create an iterative loader for training data, and a second iterative loader for test data (which we'll use to validate the trained model). The loaders will transform the image data into *tensors*, which are the core data structure used in PyTorch, and normalize them so that the pixel values are in a scale with a mean of 0.5 and a standard deviation of 0.5.\n", "\n", "Run the following cell to define the data loaders." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Function to ingest data using training and test loaders\n", "def load_dataset(data_path):\n", " # Load all of the images\n", " transformation = transforms.Compose([\n", " # transform to tensors\n", " transforms.ToTensor(),\n", " # Normalize the pixel values (in R, G, and B channels)\n", " transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])\n", " ])\n", "\n", " # Load all of the images, transforming them\n", " full_dataset = torchvision.datasets.ImageFolder(\n", " root=data_path,\n", " transform=transformation\n", " )\n", " \n", " \n", " # Split into training (70% and testing (30%) datasets)\n", " train_size = int(0.7 * len(full_dataset))\n", " test_size = len(full_dataset) - train_size\n", " train_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, test_size])\n", " \n", " # define a loader for the training data we can iterate through in 50-image batches\n", " train_loader = torch.utils.data.DataLoader(\n", " train_dataset,\n", " batch_size=50,\n", " num_workers=0,\n", " shuffle=False\n", " )\n", " \n", " # define a loader for the testing data we can iterate through in 50-image batches\n", " test_loader = torch.utils.data.DataLoader(\n", " test_dataset,\n", " batch_size=50,\n", " num_workers=0,\n", " shuffle=False\n", " )\n", " \n", " return train_loader, test_loader\n", "\n", "\n", "# Get the iterative dataloaders for test and training data\n", "train_loader, test_loader = load_dataset(data_path)\n", "print('Data loaders ready')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the CNN\n", "\n", "In PyTorch, you define a neural network model as a class that is derived from the **nn.Module** base class. Your class must define the layers in your network, and provide a **forward** method that is used to process data through the layers of the network." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Create a neural net class\n", "class Net(nn.Module):\n", " # Constructor\n", " def __init__(self, num_classes=3):\n", " super(Net, self).__init__()\n", " \n", " # Our images are RGB, so input channels = 3. We'll apply 12 filters in the first convolutional layer\n", " self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)\n", " \n", " # We'll apply max pooling with a kernel size of 2\n", " self.pool = nn.MaxPool2d(kernel_size=2)\n", " \n", " # A second convolutional layer takes 12 input channels, and generates 12 outputs\n", " self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)\n", " \n", " # A third convolutional layer takes 12 inputs and generates 24 outputs\n", " self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)\n", " \n", " # A drop layer deletes 20% of the features to help prevent overfitting\n", " self.drop = nn.Dropout2d(p=0.2)\n", " \n", " # Our 128x128 image tensors will be pooled twice with a kernel size of 2. 128/2/2 is 32.\n", " # So our feature tensors are now 32 x 32, and we've generated 24 of them\n", " # We need to flatten these and feed them to a fully-connected layer\n", " # to map them to the probability for each class\n", " self.fc = nn.Linear(in_features=32 * 32 * 24, out_features=num_classes)\n", "\n", " def forward(self, x):\n", " # Use a relu activation function after layer 1 (convolution 1 and pool)\n", " x = F.relu(self.pool(self.conv1(x)))\n", " \n", " # Use a relu activation function after layer 2 (convolution 2 and pool)\n", " x = F.relu(self.pool(self.conv2(x)))\n", " \n", " # Select some features to drop after the 3rd convolution to prevent overfitting\n", " x = F.relu(self.drop(self.conv3(x)))\n", " \n", " # Only drop the features if this is a training pass\n", " x = F.dropout(x, training=self.training)\n", " \n", " # Flatten\n", " x = x.view(-1, 32 * 32 * 24)\n", " # Feed to fully-connected layer to predict class\n", " x = self.fc(x)\n", " # Return class probabilities via a log_softmax function \n", " return F.log_softmax(x, dim=1)\n", " \n", "print(\"CNN model class defined!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train the model\n", "\n", "Now that we've defined a class for the network, we can train it using the image data.\n", "\n", "Training consists of an iterative series of forward passes in which the training data is processed in batches by the layers in the network, and the optimizer goes back and adjusts the weights. We'll also use a separate set of test images to test the model at the end of each iteration (or *epoch*) so we can track the performance improvement as the training process progresses.\n", "\n", "In the example below, we use 5 epochs to train the model using the batches of images loaded by the data loaders, holding back the data in the test data loader for validation. After each epoch, a loss function measures the error (*loss*) in the model and adjusts the weights (which were randomly generated for the first iteration) to try to improve accuracy. \n", "\n", "> **Note**: We're only using 5 epochs to minimize the training time for this simple example. A real-world CNN is usually trained over more epochs than this. CNN model training is processor-intensive, involving a lot of matrix and vector-based operations; so it's recommended to perform this on a system that can leverage GPUs, which are optimized for these kinds of calculation. This will take a while to complete on a CPU-based system - status will be displayed as the training progresses." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def train(model, device, train_loader, optimizer, epoch):\n", " # Set the model to training mode\n", " model.train()\n", " train_loss = 0\n", " print(\"Epoch:\", epoch)\n", " # Process the images in batches\n", " for batch_idx, (data, target) in enumerate(train_loader):\n", " # Use the CPU or GPU as appropriate\n", " data, target = data.to(device), target.to(device)\n", " \n", " # Reset the optimizer\n", " optimizer.zero_grad()\n", " \n", " # Push the data forward through the model layers\n", " output = model(data)\n", " \n", " # Get the loss\n", " loss = loss_criteria(output, target)\n", " \n", " # Keep a running total\n", " train_loss += loss.item()\n", " \n", " # Backpropagate\n", " loss.backward()\n", " optimizer.step()\n", " \n", " # Print metrics for every 10 batches so we see some progress\n", " if batch_idx % 10 == 0:\n", " print('Training set [{}/{} ({:.0f}%)] Loss: {:.6f}'.format(\n", " batch_idx * len(data), len(train_loader.dataset),\n", " 100. * batch_idx / len(train_loader), loss.item()))\n", " \n", " # return average loss for the epoch\n", " avg_loss = train_loss / (batch_idx+1)\n", " print('Training set: Average loss: {:.6f}'.format(avg_loss))\n", " return avg_loss\n", " \n", " \n", "def test(model, device, test_loader):\n", " # Switch the model to evaluation mode (so we don't backpropagate or drop)\n", " model.eval()\n", " test_loss = 0\n", " correct = 0\n", " with torch.no_grad():\n", " batch_count = 0\n", " for data, target in test_loader:\n", " batch_count += 1\n", " data, target = data.to(device), target.to(device)\n", " \n", " # Get the predicted classes for this batch\n", " output = model(data)\n", " \n", " # Calculate the loss for this batch\n", " test_loss += loss_criteria(output, target).item()\n", " \n", " # Calculate the accuracy for this batch\n", " _, predicted = torch.max(output.data, 1)\n", " correct += torch.sum(target==predicted).item()\n", "\n", " # Calculate the average loss and total accuracy for this epoch\n", " avg_loss = test_loss/batch_count\n", " print('Validation set: Average loss: {:.6f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n", " avg_loss, correct, len(test_loader.dataset),\n", " 100. * correct / len(test_loader.dataset)))\n", " \n", " # return average loss for the epoch\n", " return avg_loss\n", " \n", " \n", "# Now use the train and test functions to train and test the model \n", "\n", "device = \"cpu\"\n", "if (torch.cuda.is_available()):\n", " # if GPU available, use cuda (on a cpu, training will take a considerable length of time!)\n", " device = \"cuda\"\n", "print('Training on', device)\n", "\n", "# Create an instance of the model class and allocate it to the device\n", "model = Net(num_classes=len(classes)).to(device)\n", "\n", "# Use an \"Adam\" optimizer to adjust weights\n", "# (see https://pytorch.org/docs/stable/optim.html#algorithms for details of supported algorithms)\n", "optimizer = optim.Adam(model.parameters(), lr=0.001)\n", "\n", "# Specify the loss criteria\n", "loss_criteria = nn.CrossEntropyLoss()\n", "\n", "# Track metrics in these arrays\n", "epoch_nums = []\n", "training_loss = []\n", "validation_loss = []\n", "\n", "# Train over 5 epochs (in a real scenario, you'd likely use many more)\n", "epochs = 5\n", "for epoch in range(1, epochs + 1):\n", " train_loss = train(model, device, train_loader, optimizer, epoch)\n", " test_loss = test(model, device, test_loader)\n", " epoch_nums.append(epoch)\n", " training_loss.append(train_loss)\n", " validation_loss.append(test_loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## View the loss history\n", "\n", "We tracked average training and validation loss for each epoch. We can plot these to verify that loss reduced as the model was trained, and to detect *over-fitting* (which is indicated by a continued drop in training loss after validation loss has levelled out or started to increase)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from matplotlib import pyplot as plt\n", "\n", "plt.plot(epoch_nums, training_loss)\n", "plt.plot(epoch_nums, validation_loss)\n", "plt.xlabel('epoch')\n", "plt.ylabel('loss')\n", "plt.legend(['training', 'validation'], loc='upper right')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate model performance\n", "\n", "You can see the final accuracy based on the test data, but typically you'll want to explore performance metrics in a little more depth. Let's plot a confusion matrix to see how well the model is predicting each class." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false, "tags": [] }, "outputs": [], "source": [ "# Pytorch doesn't have a built-in confusion matrix metric, so we'll use SciKit-Learn\n", "from sklearn.metrics import confusion_matrix\n", "\n", "# Set the model to evaluate mode\n", "model.eval()\n", "\n", "# Get predictions for the test data and convert to numpy arrays for use with SciKit-Learn\n", "print(\"Getting predictions from test set...\")\n", "truelabels = []\n", "predictions = []\n", "for data, target in test_loader:\n", " for label in target.cpu().data.numpy():\n", " truelabels.append(label)\n", " for prediction in model.cpu()(data).data.numpy().argmax(1):\n", " predictions.append(prediction) \n", "\n", "# Plot the confusion matrix\n", "cm = confusion_matrix(truelabels, predictions)\n", "plt.imshow(cm, interpolation=\"nearest\", cmap=plt.cm.Blues)\n", "plt.colorbar()\n", "tick_marks = np.arange(len(classes))\n", "plt.xticks(tick_marks, classes, rotation=45)\n", "plt.yticks(tick_marks, classes)\n", "plt.xlabel(\"Predicted Shape\")\n", "plt.ylabel(\"Actual Shape\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save the Trained model\n", "\n", "Now that you've trained a working model, you can save it (including the trained weights) for use later." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Save the model weights\n", "model_file = 'models/shape_classifier.pt'\n", "torch.save(model.state_dict(), model_file)\n", "del model\n", "print('model saved as', model_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use the trained model\n", "\n", "Now that we've trained and evaluated our model, we can use it to predict classes for new images." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import os\n", "from random import randint\n", "%matplotlib inline\n", "\n", "\n", "# Function to predict the class of an image\n", "def predict_image(classifier, image):\n", " import numpy\n", " \n", " # Set the classifer model to evaluation mode\n", " classifier.eval()\n", " \n", " # Apply the same transformations as we did for the training images\n", " transformation = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])\n", " ])\n", "\n", " # Preprocess the image\n", " image_tensor = transformation(image).float()\n", "\n", " # Add an extra batch dimension since pytorch treats all inputs as batches\n", " image_tensor = image_tensor.unsqueeze_(0)\n", "\n", " # Turn the input into a Variable\n", " input_features = Variable(image_tensor)\n", "\n", " # Predict the class of the image\n", " output = classifier(input_features)\n", " index = output.data.numpy().argmax()\n", " return index\n", "\n", "\n", "# Function to create a random image (of a square, circle, or triangle)\n", "def create_image (size, shape):\n", " from random import randint\n", " import numpy as np\n", " from PIL import Image, ImageDraw\n", " \n", " xy1 = randint(10,40)\n", " xy2 = randint(60,100)\n", " col = (randint(0,200), randint(0,200), randint(0,200))\n", "\n", " img = Image.new(\"RGB\", size, (255, 255, 255))\n", " draw = ImageDraw.Draw(img)\n", " \n", " if shape == 'circle':\n", " draw.ellipse([(xy1,xy1), (xy2,xy2)], fill=col)\n", " elif shape == 'triangle':\n", " draw.polygon([(xy1,xy1), (xy2,xy2), (xy2,xy1)], fill=col)\n", " else: # square\n", " draw.rectangle([(xy1,xy1), (xy2,xy2)], fill=col)\n", " del draw\n", " \n", " return np.array(img)\n", "\n", "# Create a random test image\n", "classnames = os.listdir(os.path.join('data', 'shapes'))\n", "classnames.sort()\n", "shape = classnames[randint(0, len(classnames)-1)]\n", "img = create_image ((128,128), shape)\n", "\n", "# Display the image\n", "plt.axis('off')\n", "plt.imshow(img)\n", "\n", "# Create a new model class and load the saved weights\n", "model = Net()\n", "model.load_state_dict(torch.load(model_file))\n", "\n", "# Call the predction function\n", "index = predict_image(model, img)\n", "print(classes[index])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Reading\n", "\n", "To learn more about training convolutional neural networks with PyTorch, see the [PyTorch documentation](https://pytorch.org/).\n", "\n", "## Challenge: Safari Image Classification\n", "\n", "Hopefully this notebook has shown you the main steps in training and evaluating a CNN. Why not put what you've learned into practice with our Safari image classification challenge in the [/challenges/05 - Safari CNN Challenge.ipynb](./challenges/05%20-%20Safari%20CNN%20Challenge.ipynb) notebook?\n", "\n", "> **Note**: The time to complete this optional challenge is not included in the estimated time for this exercise - you can spend as little or as much time on it as you like!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.6 - AzureML", "language": "python", "name": "python3-azureml" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }