updated notes on nn interpetability -- cs230

devshah21 · devshah21 · commit 1d6970c6a2a1 · 2024-07-13T20:42:56.000-04:00
diff --git a/stanford_lectures/cs230/cs230.md b/stanford_lectures/cs230/cs230.md
@@ -85,4 +85,30 @@
 - **saliency maps**
     - with the pixels that have a stronger influence on the score, we can use that to segment the image — this can be done using a simple thresholding value
         - saliency maps is a quick technique to visualize what the network is looking at
+- **occlusion sensitivity**
+    - let’s say we’re trying to classify the same dog / cat image from before, but we cover some pixels up with a grey square — now we test how much impact the square has in different locations
+        - where ever the output is the lowest, that’s where the dog most likely is / where the most important pixels are
+        - **important:** it may also be that the square improves the probability, for example if there’s a human and a dog in the picture, if the square covers up the human, the model will most likely be able to perform better
+- **class activation maps**
+    - classification networks have really good localization ability
+        - off topic, but your typical CNN structure looks like this
+            - a bunch of conv, relu, max pool layers (in that exact order) and then you flatten the output, pass it through a couple FCC layers, apply softmax and then you get the output. the FCC players the role of the classifier
+    - building off of that idea, to produce a class activation map, we get rid of the flatten layer as it gets rid of all the spatial information as everything is flattened into one vector
+        - instead we use a global average pooling layer — we take all the feature maps produced by the previous conv layer and then take the average of each feature map
+            - for example if the dimension was (4,4,6) — this corresponds to 6 feature maps with dimension 4x4 — then the new output after global avg pooling would be (1,1,6)
+                - after this, we pass it through a single FCC + softmax and it outputs a bunch of probabilities corresponding to each class
+    - the feature maps actually contain some **visual patterns**
+        - and there’ll be some parts that are lit up and that tells you the activations have found something in those spots — you can repeat this process for all the feature maps
+            - this basically means there was a visual pattern in the input that activated the feature map
+                
+                ![Screenshot 2024-07-13 at 7.26.19 PM.png](https://prod-files-secure.s3.us-west-2.amazonaws.com/c9aa599c-2115-4330-846e-652102e8621e/5fdf3b37-7592-4cef-b54f-dceec38518f1/Screenshot_2024-07-13_at_7.26.19_PM.png)
+                
+        - looking at the image above, the score of dog is 91% — now we can reverse engineer and see how much of that score came from each of these feature maps
+            - if we take a weighted average and sum it all up, you’ll get another feature map which is the **class activation map** for “dog”
+                
+                ![Screenshot 2024-07-13 at 7.29.02 PM.png](https://prod-files-secure.s3.us-west-2.amazonaws.com/c9aa599c-2115-4330-846e-652102e8621e/fb9f6ea2-190a-4d75-aa27-c1a67c393784/Screenshot_2024-07-13_at_7.29.02_PM.png)
+                
+            - you can see that it was probably highly influence by the 2nd feature map as that’s components of the dog
+    - **dataset search**
+        - take a feature map from the last conv layer and then find examples in the dataset which coordinate to that feature map — you will likely find a common trend and you’ll know what that feature map was looking for
         -