🚀 Spaceship Titanic: TensorFlow Decision Forests

A robust baseline solution using TensorFlow Decision Forests (TF-DF) for the Kaggle Spaceship Titanic competition.

📌 Overview

This project aims to predict whether a passenger on the Spaceship Titanic was transported to an alternate dimension. We utilize TensorFlow Decision Forests (TF-DF), a library that enables the training of tree-based models (such as Random Forest) using the familiar and easy-to-use Keras API.

This approach is highly effective for tabular data as it requires minimal preprocessing compared to classical Neural Networks.

📊 Dataset

The dataset is sourced from the Kaggle competition: Spaceship Titanic.

Total Training Data: 8,693 entries with 14 features.
Target: Transported (Boolean) - Whether the passenger was transported.
Key Features: HomePlanet, CryoSleep, Cabin, Destination, Age, VIP, and luxury amenities expenditure (RoomService, FoodCourt, etc.).

🛠️ Methodology & Workflow

The code in this repository covers the following end-to-end steps:

1. Exploratory Data Analysis (EDA)

We analyzed the distribution of both numerical and categorical data to understand passenger characteristics and identify patterns.

2. Data Preprocessing

While TF-DF handles many data types natively, some adjustments were necessary:

Handling Missing Values: Null values in numerical and boolean columns were imputed with 0.
Boolean Conversion: Since TF-DF does not currently support boolean data types directly, columns like Transported, VIP, and CryoSleep were converted to integer (0 or 1).
Dropping Columns: Removed PassengerId and Name as they are irrelevant for training.

3. Feature Engineering

The Cabin feature, which contains data in the format Deck/Num/Side, was split into three more informative features:

Deck
Cabin_num
Side

4. Modeling

We utilized the standard Random Forest algorithm from TF-DF.

Model: tfdf.keras.RandomForestModel()
Data Split: 80% Training, 20% Validation.
Data Format: Converted Pandas DataFrame to tf.data.Dataset for optimal performance.

📈 Evaluation & Performance

The model was evaluated using accuracy metrics on the validation set and Out-of-Bag (OOB) data.

Training Time: ~54 seconds.
OOB Accuracy: ~79.73%
Validation Accuracy: 80.25%

Feature Importance

Based on the NUM_AS_ROOT metric (how often a feature appears as the root of a tree), the most influential features are:

CryoSleep (Highly dominant)
RoomService
Spa
VRDeck

💻 How to Run

Install Dependencies:

pip install tensorflow tensorflow_decision_forests pandas numpy seaborn matplotlib

Run the Notebook: Open the notebook file (e.g., spaceship_titanic_tfdf.ipynb) in Jupyter Notebook, Google Colab, or a Kaggle Kernel.
Output: The script will generate a submission.csv file ready for upload to Kaggle.

🤝 Contribution

This repository is designed as a learning baseline. Feel free to fork and experiment with:

Using GradientBoostedTreesModel instead of Random Forest.
Conducting deeper hyperparameter tuning.
Implementing more advanced missing data imputation techniques.

Created based on TensorFlow Decision Forests v1.2.0 implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
spaceship-titanic-with-tfdf.ipynb		spaceship-titanic-with-tfdf.ipynb
submission.csv		submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Spaceship Titanic: TensorFlow Decision Forests

📌 Overview

📊 Dataset

🛠️ Methodology & Workflow

1. Exploratory Data Analysis (EDA)

2. Data Preprocessing

3. Feature Engineering

4. Modeling

📈 Evaluation & Performance

Feature Importance

💻 How to Run

🤝 Contribution

About

Uh oh!

Releases

Packages

Languages

faisalsuryasaputra/Spaceship-Titanic-Dataset-with-TensorFlow-Decision-Forests

Folders and files

Latest commit

History

Repository files navigation

🚀 Spaceship Titanic: TensorFlow Decision Forests

📌 Overview

📊 Dataset

🛠️ Methodology & Workflow

1. Exploratory Data Analysis (EDA)

2. Data Preprocessing

3. Feature Engineering

4. Modeling

📈 Evaluation & Performance

Feature Importance

💻 How to Run

🤝 Contribution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages