Machine Learning portfolio featuring supervised and unsupervised learning projects developed with Python, real-world datasets and visual model interpretation.
This project compares supervised classification and unsupervised clustering using the classic Iris dataset.
The goal is to understand whether algorithms such as KMeans can discover natural flower groups without using the real species labels.
This project applies Association Rule Learning techniques using the Apriori algorithm to discover hidden purchasing patterns in supermarket transaction data.
The objective is to identify:
- frequently purchased products
- product relationships
- strong association rules
- customer purchasing patterns
This type of analysis is commonly used in:
- Retail
- E-commerce
- Recommendation systems
- Cross-selling strategies
This repository documents my practical Machine Learning learning journey through experimentation with:
- Regression
- Binary Classification
- Multiclass Classification
- Clustering
- Association Rule Learning
- Market Basket Analysis
- Feature Engineering
- Exploratory Data Analysis
- Model Comparison
- PCA Dimensionality Reduction
- Data Visualization
- Interactive Visualizations
The main goal is not only to train models, but also to understand how they behave, how they generalize and how their results can be interpreted visually.
- Python
- Pandas
- NumPy
- Scikit-Learn
- XGBoost
- Mlxtend
- Matplotlib
- Seaborn
- Plotly
- Jupyter Notebook
- Google Colab
Machine Learning/
├── association_rules/
│ ├── data/
│ ├── images/
│ ├── notebooks/
│ └── reports/
├── regression/
├── binary_classification/
├── multiclass_classification/
│ ├── data/
│ ├── images/
│ ├── models/
│ └── notebooks/
├── unsupervised/
│ ├── images/
│ ├── models/
│ ├── reports/
│ └── wine/
├── docs/
├── README.md
└── LICENSE
Association Rule Learning project focused on discovering hidden purchasing patterns within supermarket transaction data.
Apply the Apriori algorithm to:
- identify frequent product combinations
- generate association rules
- evaluate support, confidence and lift
- extract business insights from customer baskets
- Transaction Encoding
- Apriori Algorithm
- Association Rules
- Support Analysis
- Confidence Analysis
- Lift Analysis
- Market Basket Analysis
- Some products appear together significantly more often than expected.
- Lift analysis helps identify strong product relationships.
- Market Basket Analysis can support recommendation systems and cross-selling strategies.
- Certain food combinations reveal customer purchasing habits.
association_rules/notebooks/01_market_basket_analysis.ipynb
Projects focused on predicting continuous numerical values such as house prices, concrete strength and used car prices.
- Linear Regression
- Polynomial Regression
- Decision Tree Regressor
- Random Forest Regressor
- XGBoost Regressor
- Housing Price Prediction
- Concrete Strength Prediction
- Used Cars Price Prediction
- Exploratory Data Analysis
- Preprocessing
- Scaling
- Feature Engineering
- Ensemble Learning
- Model Comparison
- Generalization Analysis
Projects focused on predicting binary categories and evaluating classification performance.
Classification project using medical and lifestyle variables to predict cardiovascular disease risk.
- Logistic Regression
- KNN
- Decision Tree
- Random Forest
- Naive Bayes
- XGBoost
- Confusion Matrices
- Precision, Recall and F1-score
- Overfitting and Underfitting
- Feature Importance
- Model Comparison
Classification of mushrooms as edible or poisonous.
- Categorical Encoding
- Preprocessing
- Classification Modeling
- Confusion Matrices
- Model Evaluation
Projects focused on multiclass classification using supervised Machine Learning algorithms.
Compare supervised classification and unsupervised clustering using the Iris dataset.
This project explores whether clustering algorithms can identify natural groups similar to the real flower species without using the target labels.
- Load and inspect the Iris dataset
- Perform Exploratory Data Analysis
- Analyze feature correlations
- Train a KNN classifier
- Evaluate classification performance
- Apply KMeans clustering
- Compare real species with clusters
- Apply PCA for dimensionality reduction
- Visualize real species and clusters in 2D
- K-Nearest Neighbors
- KMeans
- StandardScaler
- PCA
- Confusion Matrix
- Classification Report
- Correlation Heatmap
Iris-setosais clearly separated from the other species.- Petal measurements are highly useful for separating classes.
- KNN performs very well because the dataset has clear class boundaries.
- KMeans discovers groups that are quite similar to the real species.
- PCA provides a clear 2D projection of the dataset structure.
multiclass_classification/notebooks/iris/01_iris_classification_vs_clustering.ipynb
Multiclass classification project using agricultural IoT sensor data collected from smart greenhouse environments.
- Multiclass Classification
- Correlation Analysis
- Random Forest Classification
- Heatmaps
- Confusion Matrices
- Feature Importance
- Model Comparison
Classification project focused on predicting abalone sex categories using physical measurements.
- Logistic Regression
- KNN
- Decision Tree
- Random Forest
- Label Encoding
- Feature Scaling
- Feature Selection
- Correlation Analysis
- Classification Reports
- Confusion Matrices
- Model Interpretation
Projects focused on discovering hidden patterns in data without using target labels during model training.
Customer segmentation project using unsupervised learning techniques.
- KMeans
- Hierarchical Clustering
- DBSCAN
- Mean Shift
- Customer Segmentation
- Distance-based Clustering
- Density-based Clustering
- Model Comparison
- Cluster Visualization
Clustering project using the Wine Dataset.
Remove the original classification column and evaluate whether clustering algorithms can discover natural groups using only the chemical characteristics of the wines.
- Load and inspect the dataset
- Remove
Customer_Segment - Scale numerical variables
- Apply PCA for dimensionality reduction
- Create 2D and 3D visualizations
- Compare real classification with clustering results
- Evaluate clustering quality using Silhouette Score
- Use the Elbow Method to estimate the optimal number of clusters
- KMeans
- Agglomerative Clustering
- DBSCAN
- KMeans was the algorithm that most closely reproduced the original classification.
- Agglomerative Clustering produced very similar results.
- DBSCAN did not adapt well because the groups are not mainly density-based.
- The Elbow Method suggested 3 clusters.
- Silhouette Score showed moderate cluster separation.
unsupervised/wine/images/comparison/real_classification_3d.html
unsupervised/wine/images/kmeans/kmeans_3d.html
unsupervised/wine/notebooks/wine_clustering.ipynb
| Model | Approximate R² |
|---|---|
| Linear Regression | ~0.68 |
| Decision Tree | ~0.76 |
| Random Forest | ~0.89 |
| XGBoost | ~0.91 |
| Model | Approximate R² |
|---|---|
| Linear Regression | ~0.86 |
| Decision Tree | ~0.88 |
| Random Forest | ~0.915 |
| XGBoost | ~0.914 |
| Model | Approximate Accuracy |
|---|---|
| Logistic Regression | ~56% |
| Technique | Result |
|---|---|
| KNN Classification | Strong supervised classification performance |
| KMeans Clustering | Good natural group detection |
| PCA | Clear 2D separation of the main groups |
| Algorithm | Result |
|---|---|
| KMeans | Best match |
| Agglomerative Clustering | Very similar |
| DBSCAN | Less suitable |
- MAE
- MSE
- RMSE
- R² Score
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
- Inertia
- Silhouette Score
- Elbow Method
- PCA Cluster Comparison
- Support
- Confidence
- Lift
The repository includes:
- Correlation Heatmaps
- Confusion Matrices
- Model Comparison Charts
- Feature Importance Plots
- Real vs Predicted Plots
- PCA Scatterplots
- 3D Visualizations
- Interactive Plotly Visualizations
- Clustering Comparison Plots
- Market Basket Analysis Charts
- Association Rule Visualizations
- Exploratory Data Analysis
- Data Preprocessing
- Feature Engineering
- Feature Scaling
- Regression Modeling
- Binary Classification
- Multiclass Classification
- Unsupervised Learning
- Clustering
- PCA
- Association Rule Learning
- Market Basket Analysis
- Ensemble Learning
- Boosting
- Feature Importance
- Overfitting and Underfitting
- Generalization
- Model Comparison
- Model Interpretation
- Cross Validation
- GridSearchCV
- Advanced Pipelines
- Hyperparameter Optimization
- Model Persistence with Joblib
- Additional Kaggle Competitions
- More Unsupervised Learning Projects
- Deep Learning Projects
Bea Lamiquiz
Machine Learning portfolio focused on practical experimentation, model comparison and applied analysis using real-world datasets.











