🧠 Machine Learning Projects

Machine Learning portfolio featuring supervised and unsupervised learning projects developed with Python, real-world datasets and visual model interpretation.

🌟 Featured Projects

🌸 Iris Classification vs Clustering

This project compares supervised classification and unsupervised clustering using the classic Iris dataset.

The goal is to understand whether algorithms such as KMeans can discover natural flower groups without using the real species labels.

🛒 Market Basket Analysis with Association Rules

This project applies Association Rule Learning techniques using the Apriori algorithm to discover hidden purchasing patterns in supermarket transaction data.

The objective is to identify:

frequently purchased products
product relationships
strong association rules
customer purchasing patterns

This type of analysis is commonly used in:

Retail
E-commerce
Recommendation systems
Cross-selling strategies

📈 Confidence vs Lift

🚀 Top Rules by Lift

🚀 About This Repository

This repository documents my practical Machine Learning learning journey through experimentation with:

Regression
Binary Classification
Multiclass Classification
Clustering
Association Rule Learning
Market Basket Analysis
Feature Engineering
Exploratory Data Analysis
Model Comparison
PCA Dimensionality Reduction
Data Visualization
Interactive Visualizations

The main goal is not only to train models, but also to understand how they behave, how they generalize and how their results can be interpreted visually.

🛠 Technologies Used

Python
Pandas
NumPy
Scikit-Learn
XGBoost
Mlxtend
Matplotlib
Seaborn
Plotly
Jupyter Notebook
Google Colab

📂 Repository Structure

Machine Learning/
├── association_rules/
│   ├── data/
│   ├── images/
│   ├── notebooks/
│   └── reports/
├── regression/
├── binary_classification/
├── multiclass_classification/
│   ├── data/
│   ├── images/
│   ├── models/
│   └── notebooks/
├── unsupervised/
│   ├── images/
│   ├── models/
│   ├── reports/
│   └── wine/
├── docs/
├── README.md
└── LICENSE

🛒 Association Rules & Market Basket Analysis

Association Rule Learning project focused on discovering hidden purchasing patterns within supermarket transaction data.

📌 Objective

Apply the Apriori algorithm to:

identify frequent product combinations
generate association rules
evaluate support, confidence and lift
extract business insights from customer baskets

⚙️ Techniques Used

Transaction Encoding
Apriori Algorithm
Association Rules
Support Analysis
Confidence Analysis
Lift Analysis
Market Basket Analysis

📊 Main Visualizations

Top Purchased Products

Itemset Length Distribution

Support Distribution

Support vs Confidence

Confidence vs Lift

Top Association Rules by Lift

📌 Key Findings

Some products appear together significantly more often than expected.
Lift analysis helps identify strong product relationships.
Market Basket Analysis can support recommendation systems and cross-selling strategies.
Certain food combinations reveal customer purchasing habits.

📓 Main Notebook

association_rules/notebooks/01_market_basket_analysis.ipynb

📈 Regression Projects

Projects focused on predicting continuous numerical values such as house prices, concrete strength and used car prices.

Models Explored

Linear Regression
Polynomial Regression
Decision Tree Regressor
Random Forest Regressor
XGBoost Regressor

Projects

Housing Price Prediction
Concrete Strength Prediction
Used Cars Price Prediction

Main concepts explored

Exploratory Data Analysis
Preprocessing
Scaling
Feature Engineering
Ensemble Learning
Model Comparison
Generalization Analysis

🧩 Binary Classification Projects

Projects focused on predicting binary categories and evaluating classification performance.

❤️ Heart Disease Classification

Classification project using medical and lifestyle variables to predict cardiovascular disease risk.

Models explored

Logistic Regression
KNN
Decision Tree
Random Forest
Naive Bayes
XGBoost

Main concepts explored

Confusion Matrices
Precision, Recall and F1-score
Overfitting and Underfitting
Feature Importance
Model Comparison

🍄 Mushroom Classification

Classification of mushrooms as edible or poisonous.

Main concepts explored

Categorical Encoding
Preprocessing
Classification Modeling
Confusion Matrices
Model Evaluation

🧠 Multiclass Classification Projects

Projects focused on multiclass classification using supervised Machine Learning algorithms.

🌸 Iris Classification vs Clustering with PCA

Objective

Compare supervised classification and unsupervised clustering using the Iris dataset.

This project explores whether clustering algorithms can identify natural groups similar to the real flower species without using the target labels.

Workflow

Load and inspect the Iris dataset
Perform Exploratory Data Analysis
Analyze feature correlations
Train a KNN classifier
Evaluate classification performance
Apply KMeans clustering
Compare real species with clusters
Apply PCA for dimensionality reduction
Visualize real species and clusters in 2D

Models and techniques explored

K-Nearest Neighbors
KMeans
StandardScaler
PCA
Confusion Matrix
Classification Report
Correlation Heatmap

Main visualizations

Correlation Matrix

Real Species Distribution

KMeans Clusters

Key findings

Iris-setosa is clearly separated from the other species.
Petal measurements are highly useful for separating classes.
KNN performs very well because the dataset has clear class boundaries.
KMeans discovers groups that are quite similar to the real species.
PCA provides a clear 2D projection of the dataset structure.

Main notebook

multiclass_classification/notebooks/iris/01_iris_classification_vs_clustering.ipynb

⚙️ IoT Agriculture Classification

Multiclass classification project using agricultural IoT sensor data collected from smart greenhouse environments.

Main concepts explored

Multiclass Classification
Correlation Analysis
Random Forest Classification
Heatmaps
Confusion Matrices
Feature Importance
Model Comparison

🐚 Abalone Multiclass Classification

Classification project focused on predicting abalone sex categories using physical measurements.

Models explored

Logistic Regression
KNN
Decision Tree
Random Forest

Main concepts explored

Label Encoding
Feature Scaling
Feature Selection
Correlation Analysis
Classification Reports
Confusion Matrices
Model Interpretation

🔍 Unsupervised Learning Projects

Projects focused on discovering hidden patterns in data without using target labels during model training.

🛍 Mall Customers Clustering

Customer segmentation project using unsupervised learning techniques.

Algorithms explored

KMeans
Hierarchical Clustering
DBSCAN
Mean Shift

Main concepts explored

Customer Segmentation
Distance-based Clustering
Density-based Clustering
Model Comparison
Cluster Visualization

🍷 Wine Clustering Project

Clustering project using the Wine Dataset.

Objective

Remove the original classification column and evaluate whether clustering algorithms can discover natural groups using only the chemical characteristics of the wines.

Workflow

Load and inspect the dataset
Remove Customer_Segment
Scale numerical variables
Apply PCA for dimensionality reduction
Create 2D and 3D visualizations
Compare real classification with clustering results
Evaluate clustering quality using Silhouette Score
Use the Elbow Method to estimate the optimal number of clusters

Algorithms explored

KMeans
Agglomerative Clustering
DBSCAN

Main findings

KMeans was the algorithm that most closely reproduced the original classification.
Agglomerative Clustering produced very similar results.
DBSCAN did not adapt well because the groups are not mainly density-based.
The Elbow Method suggested 3 clusters.
Silhouette Score showed moderate cluster separation.

Interactive visualizations

unsupervised/wine/images/comparison/real_classification_3d.html
unsupervised/wine/images/kmeans/kmeans_3d.html

Main notebook

unsupervised/wine/notebooks/wine_clustering.ipynb

📊 Highlighted Results

🏡 Kaggle House Prices

Model	Approximate R²
Linear Regression	~0.68
Decision Tree	~0.76
Random Forest	~0.89
XGBoost	~0.91

🚗 Used Cars Price Prediction

Model	Approximate R²
Linear Regression	~0.86
Decision Tree	~0.88
Random Forest	~0.915
XGBoost	~0.914

🐚 Abalone Multiclass Classification

Model	Approximate Accuracy
Logistic Regression	~56%

🌸 Iris Classification and Clustering

Technique	Result
KNN Classification	Strong supervised classification performance
KMeans Clustering	Good natural group detection
PCA	Clear 2D separation of the main groups

🍷 Wine Clustering

Algorithm	Result
KMeans	Best match
Agglomerative Clustering	Very similar
DBSCAN	Less suitable

📏 Metrics Used

Regression Metrics

MAE
MSE
RMSE
R² Score

Classification Metrics

Accuracy
Precision
Recall
F1-score
Confusion Matrix

Clustering Metrics

Inertia
Silhouette Score
Elbow Method
PCA Cluster Comparison

Association Rule Metrics

Support
Confidence
Lift

📸 Visualizations Included

The repository includes:

Correlation Heatmaps
Confusion Matrices
Model Comparison Charts
Feature Importance Plots
Real vs Predicted Plots
PCA Scatterplots
3D Visualizations
Interactive Plotly Visualizations
Clustering Comparison Plots
Market Basket Analysis Charts
Association Rule Visualizations

🧪 Concepts Explored

Exploratory Data Analysis
Data Preprocessing
Feature Engineering
Feature Scaling
Regression Modeling
Binary Classification
Multiclass Classification
Unsupervised Learning
Clustering
PCA
Association Rule Learning
Market Basket Analysis
Ensemble Learning
Boosting
Feature Importance
Overfitting and Underfitting
Generalization
Model Comparison
Model Interpretation

🚀 Next Steps

Cross Validation
GridSearchCV
Advanced Pipelines
Hyperparameter Optimization
Model Persistence with Joblib
Additional Kaggle Competitions
More Unsupervised Learning Projects
Deep Learning Projects

👩‍💻 Author

Bea Lamiquiz

Machine Learning portfolio focused on practical experimentation, model comparison and applied analysis using real-world datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
association_rules		association_rules
binary_classification		binary_classification
docs		docs
multiclass_classification		multiclass_classification
regression		regression
unsupervised/wine		unsupervised/wine
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🧠 Machine Learning Projects

🌟 Featured Projects

🌸 Iris Classification vs Clustering

🛒 Market Basket Analysis with Association Rules

📈 Confidence vs Lift

🚀 Top Rules by Lift

🚀 About This Repository

🛠 Technologies Used

📂 Repository Structure

🛒 Association Rules & Market Basket Analysis

📌 Objective

⚙️ Techniques Used

📊 Main Visualizations

Top Purchased Products

Itemset Length Distribution

Support Distribution

Support vs Confidence

Confidence vs Lift

Top Association Rules by Lift

📌 Key Findings

📓 Main Notebook

📈 Regression Projects

Models Explored

Projects

Main concepts explored

🧩 Binary Classification Projects

❤️ Heart Disease Classification

Models explored

Main concepts explored

🍄 Mushroom Classification

Main concepts explored

🧠 Multiclass Classification Projects

🌸 Iris Classification vs Clustering with PCA

Objective

Workflow

Models and techniques explored

Main visualizations

Correlation Matrix

Real Species Distribution

KMeans Clusters

Key findings

Main notebook

⚙️ IoT Agriculture Classification

Main concepts explored

🐚 Abalone Multiclass Classification

Models explored

Main concepts explored

🔍 Unsupervised Learning Projects

🛍 Mall Customers Clustering

Algorithms explored

Main concepts explored

🍷 Wine Clustering Project

Objective

Workflow

Algorithms explored

Main findings

Interactive visualizations

Main notebook

📊 Highlighted Results

🏡 Kaggle House Prices

🚗 Used Cars Price Prediction

🐚 Abalone Multiclass Classification

🌸 Iris Classification and Clustering

🍷 Wine Clustering

📏 Metrics Used

Regression Metrics

Classification Metrics

Clustering Metrics

Association Rule Metrics

📸 Visualizations Included

🧪 Concepts Explored

🚀 Next Steps

👩‍💻 Author

About

Topics

Resources

Packages