Skip to content

AnastasiaRassi/Forecasting-Flight-Delays-Monthly

Repository files navigation

Monthly Flight Delay Forecasting

Forecasting Flight Delays Monthly is a machine learning project that applies exploratory data analysis, statistical analysis, feature engineering, and machine learning regression to predict monthly airline delay volumes.

This repository walks through a complete forecasting workflow, from raw data through model evaluation, and delivers actionable insights for operational planning and airline decision support.


Project Overview

Flight delays are a persistent challenge for airlines, airports, and passengers. Accurately forecasting delay volumes at a monthly level can help with:

  • Staff planning
  • Resource allocation
  • Customer service readiness
  • Budgeting and risk planning

This project uses historical delay data and regression modeling to estimate future delay counts with substantial predictive power.


📂 Repository Structure

.
├── Airline_Delay_Cause.csv          # Original monthly delay dataset
├── cleaned_Airline_Delay_Data.csv   # Cleaned and preprocessed data
├── notebooks/                       # Jupyter notebooks with analysis & modeling
├── requirements.txt                # Python dependencies
└── README.md                       # Project overview

Data Description

The dataset includes monthly summaries of flights and delays from all U.S. carriers and airports, with features such as:

  • Year and month
  • Total flights
  • Delay counts
  • Delay types and causes

Preprocessing steps include cleaning, feature extraction, and cyclical encoding for time features.


Exploratory Data Analysis

We investigate:

  • Monthly and yearly trends in delay volumes
  • Seasonal patterns using cyclic encodings (sin/cos)
  • Correlation between features and delay outcomes

Visualizations are provided via notebooks to illustrate patterns and support feature engineering decisions.


Feature Engineering

Time features are encoded cyclically (e.g., month via sine/cosine) to capture seasonality. Additional engineered variables improve model compatibility with temporal patterns.


Modeling & Evaluation

Both bagging and boosting regression models (including Random Forest and CatBoost) with optimized hyperparameters, using Optuna, are trained and validated.

Final Test Results (Random Forest):

Metric Value
R² (Test) 0.801
MAE 25.46 flights
RMSE 80.27 flights
Median AE 5.50 flights

These results show the model explains ~80% of variance and delivers interpretable error margins, making it a practical tool for planning and forecasting.


Visualization Highlights

The repository includes visualizations for:

  • Actual vs. predicted delay timelines
  • Train vs test comparisons
  • Feature importance plots

These visuals help stakeholders quickly grasp performance and trends.


💼 Business Impact

This model can be used to support:

  • Operational staffing forecasts
  • Seasonal capacity planning
  • Budget forecasting for delay‑related costs
  • Early warning for high‑demand or high‑delay months

By forecasting delays with ~80% explained variance, airlines and airports gain a reliable strategic planning tool.


Next Steps

Potential enhancements include:

  • Integrating weather forecast data
  • Extending to weekly or daily forecasts
  • Developing flight‑level prediction models
  • Deploying as a production service or API

Requirements

Install the required Python libraries:

pip install -r requirements.txt

🧾 Licensing & Credits

Data is provided in the repository. Ensure compliance with any original data usage terms.
Thanks to all data contributors and open‑source tooling used throughout this project.

🙌 Acknowledgments

This project was developed as part of a comprehensive time‑series forecasting and machine learning workflow — showcasing real‑world modeling, validation, and business reasoning.


Happy forecasting! 

About

Monthly flight delay forecasting using EDA, statistical analysis, feature engineering and regression modelling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors