Machine Learning: Sentiment Analysis (NLP) & Image Classification (CV)

This repository contains a comprehensive suite of Artificial Intelligence projects developed for the "Artificial Intelligence" course (2025-26) at AUEB. It demonstrates the application of Statistical Learning, Recurrent Neural Networks (RNNs), and Convolutional Neural Networks (CNNs).

🚀 Project Overview

The project is structured into three specialized domains:

Part A | Statistical NLP: Binary sentiment classification of IMDB reviews using Naive Bayes and Logistic Regression with Information Gain feature selection.
Part B | Deep Learning NLP: Sequence modeling for sentiment analysis using a Stacked Bidirectional LSTM with pre-trained GloVe embeddings.
Part C | Computer Vision: Image recognition on the FashionMNIST dataset using Transfer Learning with a modified ResNet18 architecture.

🛠️ Technologies & Libraries

Core: Python 3
Deep Learning: PyTorch, Torchvision
Machine Learning: Scikit-Learn, NumPy
Visualization: Matplotlib
Hardware Acceleration: Full support for NVIDIA CUDA, Apple Silicon (MPS), and CPU.

📂 Project Structure

load_data.py: Data ingestion, cleaning, and stratified splitting (Train/Dev/Test).
scikit-learn.py: Implementation of traditional ML models and feature engineering.
rnn_model.py: Deep learning pipeline for RNN-based text classification.
cnn_fashion_mnist.py: Transfer learning pipeline for image classification.
images/: Directory containing performance plots and visual evaluations.

📈 Methodology & Results

Part A: Traditional Machine Learning (IMDB)

We utilized a Bag-of-Words representation with binary weighting. To optimize the feature space, we:

Filtered the top 50 most frequent words (noise reduction).
Pruned rare words (frequency < 5).
Selected the top 2,000 features using Information Gain (Mutual Information).

Model	Accuracy	Macro F1
Naive Bayes (Bernoulli)	85.35%	0.8535
Logistic Regression (SGD)	87.85%	0.8785

Learning Curves (Bias vs. Variance Analysis)

Naive Bayes	Logistic Regression

Part B: Recurrent Neural Networks (IMDB)

A more complex approach using a Stacked Bi-LSTM to capture sequential dependencies:

Architecture: 2 Layers, 64 Hidden Units, Global Max Pooling.
Embeddings: Pre-trained GloVe (100d) with fine-tuning.
Regularization: Dropout (0.5) to mitigate overfitting.
Performance: Achieved 84.95% Accuracy. The model was saved at the 2nd epoch to prevent significant overfitting observed in later stages.

Training Progress (Loss Curve)

Part C: Computer Vision (FashionMNIST)

Leveraging Transfer Learning to classify fashion items:

Base Model: ResNet18 (Pre-trained on ImageNet).
Adaptations: Modified input layer for grayscale images and a custom MLP head (256 ReLU units).
Optimization: Adam optimizer with Cross-Entropy Loss and Data Augmentation.
Results: Achieved a robust 90% Accuracy on the test set.

Training Progress & Visual Evaluation

💡 Key Takeaways

Feature Engineering: In traditional ML, selecting features via Information Gain proved crucial for performance, allowing Logistic Regression to outperform the Bi-LSTM in this specific dataset size.
Model Complexity: The Bi-LSTM showed rapid convergence but was prone to overfitting, highlighting the importance of Early Stopping and validation monitoring.
Transfer Learning Efficiency: Using a pre-trained ResNet18 allowed us to reach high accuracy levels significantly faster than training from scratch, even when the source (ImageNet) and target (FashionMNIST) domains differed.

🔧 Installation & Usage

Clone & Install:

git clone https://github.com/your-username/your-repo-name.git
pip install numpy scikit-learn matplotlib torch torchvision tqdm

Data Setup:
- Place the aclImdb dataset and glove.6B.100d.txt in the root directory.

Execution:

python load_data.py        # Prepare data
python scikit-learn.py     # Run Part A
python rnn_model.py        # Run Part B
python cnn_fashion_mnist.py # Run Part C

Developed as part of the "Artificial Intelligence" course at the Department of Informatics, Athens University of Economics and Business (AUEB).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning: Sentiment Analysis (NLP) & Image Classification (CV)

🚀 Project Overview

🛠️ Technologies & Libraries

📂 Project Structure

📈 Methodology & Results

Part A: Traditional Machine Learning (IMDB)

Learning Curves (Bias vs. Variance Analysis)

Part B: Recurrent Neural Networks (IMDB)

Training Progress (Loss Curve)

Part C: Computer Vision (FashionMNIST)

Training Progress & Visual Evaluation

💡 Key Takeaways

🔧 Installation & Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
README.md		README.md
cnn_fashion_mnist.py		cnn_fashion_mnist.py
load_data.py		load_data.py
rnn_model.py		rnn_model.py
scikit-learn.py		scikit-learn.py

Folders and files

Latest commit

History

Repository files navigation

Machine Learning: Sentiment Analysis (NLP) & Image Classification (CV)

🚀 Project Overview

🛠️ Technologies & Libraries

📂 Project Structure

📈 Methodology & Results

Part A: Traditional Machine Learning (IMDB)

Learning Curves (Bias vs. Variance Analysis)

Part B: Recurrent Neural Networks (IMDB)

Training Progress (Loss Curve)

Part C: Computer Vision (FashionMNIST)

Training Progress & Visual Evaluation

💡 Key Takeaways

🔧 Installation & Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages