A machine learning project that predicts passenger survival on the Titanic using Logistic Regression and Random Forest classifiers.
titanic-survival-prediction/
├── data/
│ └── Titanic-Dataset.csv
├── src/
│ ├── data_loader.py # Loads CSV data
│ ├── feature_engineering.py # Preprocessing and feature creation
│ └── model.py # Model training and evaluation
├── images/
│ └── roc_curve.png # ROC curve output
├── main.py
└── requirements.txt
- Load Data — reads the Titanic CSV dataset
- Feature Engineering — handles missing values, encodes categorical columns, scales numerical features, and creates
FamilySizefromSibSp+Parch - Split Data — 80% training / 20% testing
- Train & Evaluate — trains two models and compares them
| Feature | Description |
|---|---|
| Pclass | Passenger class (1, 2, 3) |
| Sex | Gender (encoded: male=1, female=0) |
| Age | Age (scaled) |
| Fare | Ticket fare (scaled) |
| Embarked | Port of embarkation (encoded) |
| FamilySize | SibSp + Parch + 1 |
- Logistic Regression — linear classifier, uses best threshold from ROC curve
- Random Forest — ensemble of 100 decision trees, uses best threshold from ROC curve
python -m venv ai-env
source ai-env/bin/activate
pip install -r requirements.txt
python3 main.py