Skip to content

beatriangu/Machine-Learning-Projects

Repository files navigation

🧠 Machine Learning Projects

Machine Learning portfolio featuring supervised and unsupervised learning projects developed with Python, real-world datasets and visual model interpretation.


🌟 Featured Projects

🌸 Iris Classification vs Clustering

Iris PCA Real Species

This project compares supervised classification and unsupervised clustering using the classic Iris dataset.

The goal is to understand whether algorithms such as KMeans can discover natural flower groups without using the real species labels.

Iris PCA KMeans Clusters


🛒 Market Basket Analysis with Association Rules

Top Products

This project applies Association Rule Learning techniques using the Apriori algorithm to discover hidden purchasing patterns in supermarket transaction data.

The objective is to identify:

  • frequently purchased products
  • product relationships
  • strong association rules
  • customer purchasing patterns

This type of analysis is commonly used in:

  • Retail
  • E-commerce
  • Recommendation systems
  • Cross-selling strategies

📈 Confidence vs Lift

Confidence vs Lift


🚀 Top Rules by Lift

Top Rules Lift


🚀 About This Repository

This repository documents my practical Machine Learning learning journey through experimentation with:

  • Regression
  • Binary Classification
  • Multiclass Classification
  • Clustering
  • Association Rule Learning
  • Market Basket Analysis
  • Feature Engineering
  • Exploratory Data Analysis
  • Model Comparison
  • PCA Dimensionality Reduction
  • Data Visualization
  • Interactive Visualizations

The main goal is not only to train models, but also to understand how they behave, how they generalize and how their results can be interpreted visually.


🛠 Technologies Used

  • Python
  • Pandas
  • NumPy
  • Scikit-Learn
  • XGBoost
  • Mlxtend
  • Matplotlib
  • Seaborn
  • Plotly
  • Jupyter Notebook
  • Google Colab

📂 Repository Structure

Machine Learning/
├── association_rules/
│   ├── data/
│   ├── images/
│   ├── notebooks/
│   └── reports/
├── regression/
├── binary_classification/
├── multiclass_classification/
│   ├── data/
│   ├── images/
│   ├── models/
│   └── notebooks/
├── unsupervised/
│   ├── images/
│   ├── models/
│   ├── reports/
│   └── wine/
├── docs/
├── README.md
└── LICENSE

🛒 Association Rules & Market Basket Analysis

Association Rule Learning project focused on discovering hidden purchasing patterns within supermarket transaction data.


📌 Objective

Apply the Apriori algorithm to:

  • identify frequent product combinations
  • generate association rules
  • evaluate support, confidence and lift
  • extract business insights from customer baskets

⚙️ Techniques Used

  • Transaction Encoding
  • Apriori Algorithm
  • Association Rules
  • Support Analysis
  • Confidence Analysis
  • Lift Analysis
  • Market Basket Analysis

📊 Main Visualizations

Top Purchased Products

Top Products


Itemset Length Distribution

Itemset Distribution


Support Distribution

Support Distribution


Support vs Confidence

Support vs Confidence


Confidence vs Lift

Confidence vs Lift


Top Association Rules by Lift

Top Rules


📌 Key Findings

  • Some products appear together significantly more often than expected.
  • Lift analysis helps identify strong product relationships.
  • Market Basket Analysis can support recommendation systems and cross-selling strategies.
  • Certain food combinations reveal customer purchasing habits.

📓 Main Notebook

association_rules/notebooks/01_market_basket_analysis.ipynb

📈 Regression Projects

Projects focused on predicting continuous numerical values such as house prices, concrete strength and used car prices.

Models Explored

  • Linear Regression
  • Polynomial Regression
  • Decision Tree Regressor
  • Random Forest Regressor
  • XGBoost Regressor

Projects

  • Housing Price Prediction
  • Concrete Strength Prediction
  • Used Cars Price Prediction

Main concepts explored

  • Exploratory Data Analysis
  • Preprocessing
  • Scaling
  • Feature Engineering
  • Ensemble Learning
  • Model Comparison
  • Generalization Analysis

🧩 Binary Classification Projects

Projects focused on predicting binary categories and evaluating classification performance.


❤️ Heart Disease Classification

Classification project using medical and lifestyle variables to predict cardiovascular disease risk.

Models explored

  • Logistic Regression
  • KNN
  • Decision Tree
  • Random Forest
  • Naive Bayes
  • XGBoost

Main concepts explored

  • Confusion Matrices
  • Precision, Recall and F1-score
  • Overfitting and Underfitting
  • Feature Importance
  • Model Comparison

🍄 Mushroom Classification

Classification of mushrooms as edible or poisonous.

Main concepts explored

  • Categorical Encoding
  • Preprocessing
  • Classification Modeling
  • Confusion Matrices
  • Model Evaluation

🧠 Multiclass Classification Projects

Projects focused on multiclass classification using supervised Machine Learning algorithms.


🌸 Iris Classification vs Clustering with PCA

Iris Pairplot

Objective

Compare supervised classification and unsupervised clustering using the Iris dataset.

This project explores whether clustering algorithms can identify natural groups similar to the real flower species without using the target labels.


Workflow

  • Load and inspect the Iris dataset
  • Perform Exploratory Data Analysis
  • Analyze feature correlations
  • Train a KNN classifier
  • Evaluate classification performance
  • Apply KMeans clustering
  • Compare real species with clusters
  • Apply PCA for dimensionality reduction
  • Visualize real species and clusters in 2D

Models and techniques explored

  • K-Nearest Neighbors
  • KMeans
  • StandardScaler
  • PCA
  • Confusion Matrix
  • Classification Report
  • Correlation Heatmap

Main visualizations

Correlation Matrix

Iris Correlation Matrix

Real Species Distribution

Iris Real Species

KMeans Clusters

Iris KMeans Clusters


Key findings

  • Iris-setosa is clearly separated from the other species.
  • Petal measurements are highly useful for separating classes.
  • KNN performs very well because the dataset has clear class boundaries.
  • KMeans discovers groups that are quite similar to the real species.
  • PCA provides a clear 2D projection of the dataset structure.

Main notebook

multiclass_classification/notebooks/iris/01_iris_classification_vs_clustering.ipynb

⚙️ IoT Agriculture Classification

Multiclass classification project using agricultural IoT sensor data collected from smart greenhouse environments.

Main concepts explored

  • Multiclass Classification
  • Correlation Analysis
  • Random Forest Classification
  • Heatmaps
  • Confusion Matrices
  • Feature Importance
  • Model Comparison

🐚 Abalone Multiclass Classification

Classification project focused on predicting abalone sex categories using physical measurements.

Models explored

  • Logistic Regression
  • KNN
  • Decision Tree
  • Random Forest

Main concepts explored

  • Label Encoding
  • Feature Scaling
  • Feature Selection
  • Correlation Analysis
  • Classification Reports
  • Confusion Matrices
  • Model Interpretation

🔍 Unsupervised Learning Projects

Projects focused on discovering hidden patterns in data without using target labels during model training.


🛍 Mall Customers Clustering

Customer segmentation project using unsupervised learning techniques.

Algorithms explored

  • KMeans
  • Hierarchical Clustering
  • DBSCAN
  • Mean Shift

Main concepts explored

  • Customer Segmentation
  • Distance-based Clustering
  • Density-based Clustering
  • Model Comparison
  • Cluster Visualization

🍷 Wine Clustering Project

Clustering project using the Wine Dataset.

Objective

Remove the original classification column and evaluate whether clustering algorithms can discover natural groups using only the chemical characteristics of the wines.


Workflow

  • Load and inspect the dataset
  • Remove Customer_Segment
  • Scale numerical variables
  • Apply PCA for dimensionality reduction
  • Create 2D and 3D visualizations
  • Compare real classification with clustering results
  • Evaluate clustering quality using Silhouette Score
  • Use the Elbow Method to estimate the optimal number of clusters

Algorithms explored

  • KMeans
  • Agglomerative Clustering
  • DBSCAN

Main findings

  • KMeans was the algorithm that most closely reproduced the original classification.
  • Agglomerative Clustering produced very similar results.
  • DBSCAN did not adapt well because the groups are not mainly density-based.
  • The Elbow Method suggested 3 clusters.
  • Silhouette Score showed moderate cluster separation.

Interactive visualizations

unsupervised/wine/images/comparison/real_classification_3d.html
unsupervised/wine/images/kmeans/kmeans_3d.html

Main notebook

unsupervised/wine/notebooks/wine_clustering.ipynb

📊 Highlighted Results

🏡 Kaggle House Prices

Model Approximate R²
Linear Regression ~0.68
Decision Tree ~0.76
Random Forest ~0.89
XGBoost ~0.91

🚗 Used Cars Price Prediction

Model Approximate R²
Linear Regression ~0.86
Decision Tree ~0.88
Random Forest ~0.915
XGBoost ~0.914

🐚 Abalone Multiclass Classification

Model Approximate Accuracy
Logistic Regression ~56%

🌸 Iris Classification and Clustering

Technique Result
KNN Classification Strong supervised classification performance
KMeans Clustering Good natural group detection
PCA Clear 2D separation of the main groups

🍷 Wine Clustering

Algorithm Result
KMeans Best match
Agglomerative Clustering Very similar
DBSCAN Less suitable

📏 Metrics Used

Regression Metrics

  • MAE
  • MSE
  • RMSE
  • R² Score

Classification Metrics

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • Confusion Matrix

Clustering Metrics

  • Inertia
  • Silhouette Score
  • Elbow Method
  • PCA Cluster Comparison

Association Rule Metrics

  • Support
  • Confidence
  • Lift

📸 Visualizations Included

The repository includes:

  • Correlation Heatmaps
  • Confusion Matrices
  • Model Comparison Charts
  • Feature Importance Plots
  • Real vs Predicted Plots
  • PCA Scatterplots
  • 3D Visualizations
  • Interactive Plotly Visualizations
  • Clustering Comparison Plots
  • Market Basket Analysis Charts
  • Association Rule Visualizations

🧪 Concepts Explored

  • Exploratory Data Analysis
  • Data Preprocessing
  • Feature Engineering
  • Feature Scaling
  • Regression Modeling
  • Binary Classification
  • Multiclass Classification
  • Unsupervised Learning
  • Clustering
  • PCA
  • Association Rule Learning
  • Market Basket Analysis
  • Ensemble Learning
  • Boosting
  • Feature Importance
  • Overfitting and Underfitting
  • Generalization
  • Model Comparison
  • Model Interpretation

🚀 Next Steps

  • Cross Validation
  • GridSearchCV
  • Advanced Pipelines
  • Hyperparameter Optimization
  • Model Persistence with Joblib
  • Additional Kaggle Competitions
  • More Unsupervised Learning Projects
  • Deep Learning Projects

👩‍💻 Author

Bea Lamiquiz

Machine Learning portfolio focused on practical experimentation, model comparison and applied analysis using real-world datasets.

About

Machine Learning portfolio with practical supervised and unsupervised learning projects using Python, Scikit-Learn and real-world datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors