|
85 | 85 | - **saliency maps** |
86 | 86 | - with the pixels that have a stronger influence on the score, we can use that to segment the image — this can be done using a simple thresholding value |
87 | 87 | - saliency maps is a quick technique to visualize what the network is looking at |
| 88 | +- **occlusion sensitivity** |
| 89 | + - let’s say we’re trying to classify the same dog / cat image from before, but we cover some pixels up with a grey square — now we test how much impact the square has in different locations |
| 90 | + - where ever the output is the lowest, that’s where the dog most likely is / where the most important pixels are |
| 91 | + - **important:** it may also be that the square improves the probability, for example if there’s a human and a dog in the picture, if the square covers up the human, the model will most likely be able to perform better |
| 92 | +- **class activation maps** |
| 93 | + - classification networks have really good localization ability |
| 94 | + - off topic, but your typical CNN structure looks like this |
| 95 | + - a bunch of conv, relu, max pool layers (in that exact order) and then you flatten the output, pass it through a couple FCC layers, apply softmax and then you get the output. the FCC players the role of the classifier |
| 96 | + - building off of that idea, to produce a class activation map, we get rid of the flatten layer as it gets rid of all the spatial information as everything is flattened into one vector |
| 97 | + - instead we use a global average pooling layer — we take all the feature maps produced by the previous conv layer and then take the average of each feature map |
| 98 | + - for example if the dimension was (4,4,6) — this corresponds to 6 feature maps with dimension 4x4 — then the new output after global avg pooling would be (1,1,6) |
| 99 | + - after this, we pass it through a single FCC + softmax and it outputs a bunch of probabilities corresponding to each class |
| 100 | + - the feature maps actually contain some **visual patterns** |
| 101 | + - and there’ll be some parts that are lit up and that tells you the activations have found something in those spots — you can repeat this process for all the feature maps |
| 102 | + - this basically means there was a visual pattern in the input that activated the feature map |
| 103 | + |
| 104 | +  |
| 105 | + |
| 106 | + - looking at the image above, the score of dog is 91% — now we can reverse engineer and see how much of that score came from each of these feature maps |
| 107 | + - if we take a weighted average and sum it all up, you’ll get another feature map which is the **class activation map** for “dog” |
| 108 | + |
| 109 | +  |
| 110 | + |
| 111 | + - you can see that it was probably highly influence by the 2nd feature map as that’s components of the dog |
| 112 | + - **dataset search** |
| 113 | + - take a feature map from the last conv layer and then find examples in the dataset which coordinate to that feature map — you will likely find a common trend and you’ll know what that feature map was looking for |
88 | 114 | - |
0 commit comments