Introduction

This report presents a comprehensive analysis of different approaches for deep fake image classification tasks. The project implements and compares multiple convolutional neural network architectures, including individual CNN models and ensemble methods. The main objective is to develop an effective classification system while documenting the complete experimental process, including hyperparameter tuning and model optimization strategies.

Dataset and Data Preprocessing

Dataset Description

The training data is composed of 12,500 image files. The validation set is composed of 1,250 image files. The test is composed of 6,500 image files. Each image file have 100x100 resolution and also csv files (train and validation) have this format:

image_id,label
532de967-c8fb-49a6-9a8c-3c32cfa93d3e,0
c0519e94-1422-405c-a847-ce726f4a13cf,2
13a99838-2919-4b79-b9fd-bce8f0e59e09,2

Data Preprocessing and Augmentation

The data preprocessing pipeline includes several stages designed to improve model generalization and performance:

Training Data Augmentation

The training data undergoes the following transformations:

    -> resize to 100×100 pixels
    -> random horizontal flip with probability 0.5
    -> random rotation up to 15 degrees
    -> normalization using ImageNet statistics (mean=[0.485, 0.456, 0.406], 
                                                        std=[0.229, 0.224, 0.225])

Validation and Test Data Preprocessing

For validation and test data, only essential preprocessing is applied:

    -> resize to 100×100 pixels
    -> normalization using the same ImageNet statistics

Data Loading Configuration

    -> batch size: 4 (Ensemble CNN), 8 (Skip Connections CNN) and 16 (Three layers CNN)
    -> number of workers: 4
    -> pin memory: true (faster GPU transfer)
    -> drop last: true (for training to maintain consistent batch sizes)

Machine Learning Approaches

Model Architectures

SkipConCNN - Residual Network with Skip Connections

        Layer               Type of layer          Output Shape           No. of Parameters

          1                    Conv2d         [ -1, 128, 100, 100 ]           3,584
          2                  BatchNorm2d      [ -1, 128, 100, 100 ]            256
          3                     ReLU          [ -1, 128, 100, 100 ]             0
                                                                       
          4                    Conv2d         [ -1, 256, 100, 100 ]          295,168
          5                  BatchNorm2d      [ -1, 256, 100, 100 ]           1,024
          6                     ReLU          [ -1, 256, 100, 100 ]             0
          7                 Max Pooling2d      [ -1, 256, 50, 50 ]              0
                                                                       
   Residual Block:                                                     
          8                    Conv2d         [ -1, 512, 100, 100 ]         1,180,160
          9                  BatchNorm2d      [ -1, 512, 100, 100 ]           1,024
          10                    ReLU          [ -1, 128, 512, 100 ]             0
                                                                       
          11                   Conv2d         [ -1, 512, 100, 100 ]         2,359,808
          12                 BatchNorm2d      [ -1, 512, 100, 100 ]           1,024
                                                                       
          13                   Conv2d         [ -1, 512, 100, 100 ]          131,584
          14                 BatchNorm2d      [ -1, 512, 100, 100 ]           1,024
          15                    ReLU          [ -1, 128, 512, 100 ]             0
                                                                       
          16                ResidualBlock      [ -1, 256, 50, 50 ]              0
          17                Max Pooling2d      [ -1, 256, 50, 50 ]              0
                                                                       
          18                   Conv2d          [ -1, 256, 50, 50 ]          2,359,808
          19                 BatchNorm2d       [ -1, 256, 50, 50 ]            1,024
          20                    ReLU           [ -1, 256, 50, 50 ]              0
                                                                       
          21                   Dropout         [ -1, 256, 50, 50 ]              0
                                                                       
      Classifier:                                                      
          22              AdaptiveAvgPool2d     [ -1, 512, 6, 6 ]               0
          23                   Flatten          [ -1, 512\*6\*6]                0
          24                  Linear-1             [ -1, 128 ]              2,359,424
          25                    ReLU               [ -1, 128 ]                  0
          26                   Dropout             [ -1, 128 ]                  0
          27                  Linear-2              [ -1, 5 ]                  645
     Total params                                                             8,695,045
   Trainable params                                                           8,695,045
 Non-trainable params                                                             0

ThreeCNN - Three-Layer Convolutional Network

        Layer               Type of layer        Output Dimension        No. of Parameters

          1                    Conv2D         [ -1, 64, 100, 100 ]           1,792
          2                  BatchNorm2D      [ -1, 64, 100, 100 ]            128
          3                     ReLU          [ -1, 64, 100, 100 ]             0
          4                 Max Pooling2D      [ -1, 64, 50, 50 ]              0
                                                                      
          5                    Conv2D         [ -1, 128, 50, 50 ]           73,856
          6                  BatchNorm2D      [ -1, 128, 50, 50 ]             256
          7                     ReLU          [ -1, 128, 50, 50 ]              0
          8                 Max Pooling2D     [ -1, 128, 25, 25 ]              0
                                                                      
          9                    Conv2D         [ -1, 192, 23, 23 ]           221,376
          10                 BatchNorm2D      [ -1, 192, 23, 23 ]             384
          11                    ReLU          [ -1, 192, 23, 23 ]              0
          12                Max Pooling2D     [ -1, 192, 11, 11 ]              0
                                                                      
          13                  Dropout2D       [ -1, 192, 11, 11 ]              0
                                                                      
          14              AdaptiveAvgPool2D    [ -1, 192, 4, 4 ]               0
          15                   Flatten         [ -1, 192\*4\*4 ]               0
          16                  Linear-1             [ -1, 128]               393,344
          17                    ReLU              [ -1, 128 ]                  0
          18                   Dropout            [ -1, 128 ]                  0
          19                  Linear-2             [ -1, 5 ]                  645
     Total params                                                             691,781
   Trainable params                                                           691,781
 Non-trainable params                                                            0

EnsembleCNN - Meta-Learning Ensemble

       Component              Type of layer       Output Dimension       No. of Parameters

      Base Models                                                     
       SkipConCNN            Pre-trained CNN        [ -1, 5 ]              8,695,045
        ThreeCNN             Pre-trained CNN        [ -1, 5 ]               691,781
     Weights Layer                                                    
           1                     Linear             [ -1, 16 ]                176
           2                      ReLU              [ -1, 16 ]                 0
           3                     Linear             [ -1, 2 ]                 34
           4                     Softmax            [ -1, 2 ]                  0
    Meta Classifier                                                   
           1                     Linear             [ -1, 32 ]                352
           2                      ReLU              [ -1, 32 ]                 0
           3                     Dropout            [ -1, 32 ]                 0
           4                     Linear             [ -1, 5 ]                 165
      Total params                                                           9,387,553
  Total frozen params                                                        9,386,826
 Total trainable params                                                         727

Ensemble Strategy:

    -> Dynamic Weight Assignment: Neural network learns optimal weights for 
        combining base model predictions
    -> Meta Classification: Additional classifier processes concatenated 
        softmax outputs
    -> Confidence-based Blending: Final prediction combines weighted and meta
        predictions based on confidence scores

Training Configuration

Optimization Settings

  Parameter         SkipConCNN           ThreeCNN           EnsembleCNN

 Batch Size              8                  16                   4
Loss Function    CrossEntropyLoss    CrossEntropyLoss    CrossEntropyLoss

Label Smoothing - - 0.1 Optimizer Adam Adam Adam Learning Rate 0.0005 0.0005 0.0001 Weight Decay 0.0001 0.0001 0.0001 Scheduler ReduceLROnPlateau ReduceLROnPlateau ReduceLROnPlateau Mode 'max' 'max' 'max' Patience 5 15 5 Factor 0.5 0.8 0.5

Training Strategy

 Parameter       SkipConCNN       ThreeCNN       EnsembleCNN

  Epochs            200             200              80
Early Stop           25              15               5

Experimental Results

Confusion Matrices and Classification Reports

SkipConCNN

    Confusion Matrix:
                            0    1    2    3    4  predicted
                         +------------------------+ 
                    0    | 236   2    3    0   9  |
                    1    |  4   230   2    1   13 |
                    2    |  3    0   242   0   5  |
                    3    |  0    0    1   248  1  |
                    4    |  24   14   10   0  202 |
                  actual +------------------------+ 

    Classification Report:

                     precision    recall  f1-score   support

           0            0.88      0.94      0.91       250
           1            0.93      0.92      0.93       250
           2            0.94      0.97      0.95       250
           3            1.00      0.99      0.99       250
           4            0.88      0.81      0.84       250

    accuracy                                 0.93      1250
   macro avg             0.93      0.93      0.93      1250
weighted avg             0.93      0.93      0.93      1250

ThreeCNN

    Confusion Matrix:
                            0    1    2    3    4  predicted
                         +------------------------+ 
                    0    | 228   1    4    0   17 |
                    1    |  2   228   1    0   19 |
                    2    |  7    0   235   0   8  |
                    3    |  0    0    0   250  0  |
                    4    |  14   26   11   0  199 |
                  actual +------------------------+ 

    Classification Report:
                      precision    recall  f1-score   support

           0            0.91      0.91      0.91       250
           1            0.89      0.91      0.90       250
           2            0.94      0.94      0.94       250
           3            1.00      1.00      1.00       250
           4            0.82      0.80      0.81       250

    accuracy                                 0.91      1250
   macro avg             0.91      0.91      0.91      1250
weighted avg             0.91      0.91      0.91      1250

EnsembleCNN

    Confusion Matrix:
                            0    1    2    3    4  predicted
                         +------------------------+ 
                    0    | 234   2    3    0   11 |
                    1    |  0   238   1    0   11 |
                    2    |  3    0   242   0   5  |
                    3    |  0    0    1   248  1  |
                    4    |  11   23   10   0  206 |
                  actual +------------------------+ 

    Classification Report:
                       precision    recall  f1-score   support

           0             0.94      0.94      0.94       250
           1             0.90      0.95      0.93       250
           2             0.94      0.97      0.95       250
           3             1.00      0.99      1.00       250
           4             0.88      0.82      0.85       250

    accuracy                                 0.93      1250
   macro avg             0.93      0.93      0.93      1250
weighted avg             0.93      0.93      0.93      1250

Ablation Studies

Impact of Data Augmentation

Geometric augmentations significantly improved model performance, while photometric augmentations degraded accuracy, and advanced augmentation techniques were not evaluated in this study. The difference between training with and without photometric augmentation was around 3%. My hypothesis is that the dataset having images with 100x100 resolution means there's already limited visual information, so messing with brightness and colors just made things worse by removing the few useful details the model could actually learn from.

Impact of Residual Connections

Residual connections are architectural components that create skip pathways, allowing input information to bypass one or more layers and be added directly to the output. This mechanism enables the network to learn residual mappings rather than complete transformations. The SkipConCNN with residual connections achieved 92.64% accuracy compared to ThreeCNN's 89.78% accuracy (starting model). I think that the dataset having images with 100x100 resolution contains limited visual information, and residual connections preserved critical details that would otherwise be lost through successive convolutions. The skip connections allowed the model to maintain both local texture patterns and global structural information necessary for effective deepfake clusterization.

Ensemble Method Analysis

Ensemble learning combines predictions from multiple models to achieve better performance than individual components. The EnsembleCNN architecture merges SkipConCNN and ThreeCNN through a weighting mechanism that dynamically balances their contributions based on prediction confidence. The ensemble achieved 93.44% accuracy compared to SkipConCNN's 92.64% accuracy and ThreeCNN's 93.03% accuracy (after I adjusted the dilation for last convolutional layer and made it wider). The ensemble uses a dual-pathway approach: a learned weights layer determines optimal model combination ratios, while a separate classifier processes concatenated predictions. The final output combines weighted predictions with classifier results based on prediction confidence, allowing the model to rely more on individual model expertise when confident and fall back to ensemble learning when uncertain. Combining models with different architectural strengths (residual learning vs. traditional convolution) captures complementary feature representations.

Unsuccessful Approaches

Architecture Modifications

Several architectural experiments failed to improve performance beyond the baseline models. I implemented an EfficientNet-inspired architecture using SE blocks and multi-scale attention mechanisms, but it did not achieve more than 90% accuracy, significantly underperforming compared to simpler CNNs. A brute-force hyperparameter search on a two-convolutional CNN found optimal parameters (out1=64, out2=128, kernel=5) yielding 76% accuracy , still below our three-layer baseline.

Additionally, a five-convolutional CNN suffered from overfitting despite regularization techniques, confirming that deeper architectures were counterproductive for this dataset. The conclusion was that 3-4 convolutional layers represent the optimal depth balance for this specific task and dataset constraints.

Alternative Optimization Strategies

The ReduceLROnPlateau scheduler with patience=5 and factor=0.5 destabilized the learning process for the three-convolutional CNN due to overly aggressive learning rate reductions, so now I have this set of hyperparameters (patience=15, factor=0.8) after a kind of binary search. The difference between using Adam and AdamW optimizers was negligible, with both achieving similar performance levels. The label smoothing technique (0.1) did not yield significant improvements, and in some cases, it even degraded performance.

Limitations and Future Work

Current Limitations

The primary limitation identified during this study was the poor classification performance for Class 4, which was discovered relatively late in the experimental process. This class demonstrated significantly lower prediction accuracy compared to other categories, indicating potential issues with feature distinguishability. Two mitigation strategies were attempted but proved unsuccessful due to time constraints and suboptimal implementation:

```
Specialized CNN for Class 4  : A dedicated binary classifier was
```
trained specifically to distinguish Class 4 from other categories, but failed to achieve meaningful improvements
```
Weighted Loss Function  : Label smoothing was replaced with a
```
class-weighted tensor approach, implemented by continuing training from the best checkpoint rather than retraining from scratch, resulting in degraded performance

Both approaches were implemented as quick fixes rather than systematic solutions, contributing to their failure.

Future Work

Future research should prioritize addressing the Class 4 classification challenge through:

Data Analysis  : Comprehensive investigation of Class 4

characteristics and potential mislabeling issues

Balanced Sampling  : Implementation of advanced sampling

techniques (SMOTE, focal loss) to handle class imbalance

```
Feature Engineering  : Development of specialized feature
```
extraction methods targeting Class 4 distinguishing characteristics

Appendix

Code Structure

The project is organized as follows:

project/
|-- main.py: core logic for training and evaluation
|-- models.py: CNN architectures and ensemble implementation
|-- dataloader.py: data loading and preprocessing
|-- dataset.py: custom dataset class
|-- dataset/
|   |-- train.csv
|   |-- validation.csv
|   |-- test.csv
|   |-- train/
|   |   |-- image1.jpg
|   |   |-- image2.jpg
|   |   ...
|   |-- validation/
|   |   |-- image1.jpg
|   |   |-- image2.jpg
|   |   ...
|   |-- test/
|   |   |-- image1.jpg
|   |   |-- image2.jpg
|   |   ...
|-- env/
|-- models_pth/
|-- requirements.txt

Hardware and Software Specifications

GPU: NVIDIA GeForce GTX 1650    Driver: 576.02    CUDA: 12.9
Memory: 628MiB/4096MiB    GPU Util: 4%

CPU: Intel Core i5-10300H @ 2.50GHz    Cores: 4    Threads: 8
Cache: L1: 256KB    L2: 1MB    L3: 8MB    Architecture: x86_64
OS: Windows with WSL (Windows Subsystem for Linux)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
models_pth		models_pth
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
dataset.py		dataset.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
summary.py		summary.py

Folders and files

Latest commit

History

Repository files navigation

Introduction

Dataset and Data Preprocessing

Dataset Description

Data Preprocessing and Augmentation

Training Data Augmentation

Validation and Test Data Preprocessing

Data Loading Configuration

Machine Learning Approaches

Model Architectures

SkipConCNN - Residual Network with Skip Connections

ThreeCNN - Three-Layer Convolutional Network

EnsembleCNN - Meta-Learning Ensemble

Training Configuration

Optimization Settings

Training Strategy

Experimental Results

Confusion Matrices and Classification Reports

SkipConCNN

ThreeCNN

EnsembleCNN

Ablation Studies

Impact of Data Augmentation

Impact of Residual Connections

Ensemble Method Analysis

Unsuccessful Approaches

Architecture Modifications

Alternative Optimization Strategies

Limitations and Future Work

Current Limitations

Future Work

Appendix

Code Structure

Hardware and Software Specifications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages