Skip to content

shruti-sivakumar/Bitcoin-Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bitcoin Price Forecasting — Comparative Study (2019–2024)

A rigorous time-series analysis comparing 6 forecasting models across linear, nonlinear, and deep learning paradigms on 6 years of daily Bitcoin data.

Bitcoin Prices


Table of Contents


Project Overview

This project performs a comprehensive comparative study of Bitcoin (BTC-USD) price forecasting using 2,191 daily observations from January 2019 to December 2024. Six distinct models are rigorously evaluated under a consistent rolling one-step-ahead forecasting framework, spanning traditional econometric approaches through modern deep learning.

Category Models
Linear ARIMA(2,1,2), Holt's Linear Trend
Nonlinear GARCH(1,1), Markov-Switching AR (MS-AR), TAR/SETAR
Deep Learning LSTM (2-layer, 100 units)

Repository Structure

bitcoin-forecasting/
├── data/
│   └── bitcoin_raw_data.csv              # Raw BTC-USD OHLCV from Yahoo Finance
├── models/
│   └── best_lstm_model.keras             # Saved best LSTM weights
├── notebooks/
│   ├── 01_data_collection_eda.ipynb      # Data download & exploratory analysis
│   ├── 02_statistical_tests.ipynb        # Stationarity, autocorrelation, ARCH tests
│   ├── 03_linear_models.ipynb            # ARIMA & Holt Linear Trend
│   ├── 04_nonlinear_models.ipynb         # GARCH, MS-AR, TAR/SETAR
│   ├── 05_deeplearning_models.ipynb      # LSTM + full model comparison
│   └── 06_comparitive_analysis.ipynb     # Final synthesis & visualizations
├── results/
│   ├── figures/                          # All generated plots
│   ├── all_models_final_results.csv      # Consolidated metrics table
│   ├── final_model_comparison.csv        # Ranked comparison
│   ├── linear_models_predictions.csv
│   ├── nonlinear_models_predictions.csv
│   ├── lstm_predictions.csv
│   └── project_summary.json
├── requirements.txt
└── README.md

Dataset

  • Source: Yahoo Finance via yfinance
  • Ticker: BTC-USD
  • Period: 2019-01-01 to 2024-12-30
  • Observations: 2,191 daily records, no missing values
  • Features: Open, High, Low, Close, Volume
Statistic Value
Min Price $3,399.47
Max Price $106,140.60
Mean Price $31,473.14
Std Dev $22,079.72

Methodology

Train / Test Split

  • Train: 2019-01-01 → 2023-10-18 (80% — 1,752 observations)
  • Test: 2023-10-18 → 2024-12-30 (20% — 439 observations)
  • All models use rolling one-step-ahead forecasting — the model is retrained/updated at each step with the newest observation before predicting the next day.

Evaluation Metrics

Metric Description
RMSE Root Mean Squared Error (USD)
MAE Mean Absolute Error (USD)
MAPE Mean Absolute Percentage Error (%)
Coefficient of Determination

Notebooks

01 — Data Collection & EDA

Downloads BTC-USD data from Yahoo Finance, validates quality (zero missing values), computes basic statistics, and produces the full price time-series visualization.

Bitcoin Price Time Series Bitcoin daily closing prices 2019–2024. Key events visible: COVID crash (Mar 2020), first ATH ~$65K (Nov 2021), bear market bottom ~$16K (Nov 2022), new ATH $106K (Dec 2024).


02 — Statistical Tests

Rigorous pre-modelling analysis to justify model choices:

Test Series Statistic p-value Result
ADF Prices -0.556 0.881 ❌ Non-Stationary
ADF Returns -22.197 0.000 ✅ Stationary
KPSS Prices 3.990 0.010 ❌ Non-Stationary
Ljung-Box Prices 0.000 ✅ Significant autocorrelation (20/20 lags)
McLeod-Li Sq. Returns <0.05 ✅ ARCH effects (14/20 lags)
Jarque-Bera Returns 30,907 0.000 ❌ Non-Normal (fat tails, skew=-1.10, kurt=18.32)

Conclusion: Prices are I(1), returns are stationary. Strong autocorrelation and ARCH effects justify time-series and GARCH modelling. Fat tails justify nonlinear approaches.

ACF and PACF ACF/PACF of prices (top) and returns (bottom). Prices show slowly decaying autocorrelation — a hallmark of non-stationarity. Returns are near white noise after differencing.

Volatility Clustering Returns (left) and ACF of squared returns (right). The significant spikes in the ACF of squared returns confirm ARCH effects — large price swings cluster together in time.


03 — Linear Models

ARIMA(2,1,2)

Best order selected via AIC grid search over p ∈ {0,1,2,3}, q ∈ {0,1,2,3}, d=1:

Order AIC BIC
(2,1,2) 29,297.11 29,329.92
(2,1,3) 29,299.08 29,337.36
(3,1,3) 29,301.07 29,344.81

Model parameters (all significant at p<0.001): AR1=0.821, AR2=-0.954, MA1=-0.847, MA2=0.963.

ARIMA ACF PACF ACF/PACF of differenced training data — used to inform ARIMA order selection.

Holt's Linear Trend

Exponential smoothing with adaptive trend, parameters optimised automatically via MLE each rolling step.

Holt Predictions Holt's Linear Trend rolling forecast tracks the 2024 bull market closely, with slight lag at sharp inflection points.

Linear Model Results:

Model RMSE MAE MAPE
ARIMA(2,1,2) $1,745.35 $1,225.38 1.97% 0.9894
Holt Linear Trend $1,740.74 $1,216.17 1.96% 0.9894

04 — Nonlinear Models

GARCH(1,1)

Captures time-varying conditional volatility in log-returns with an AR(2) mean equation.

Estimated parameters:

Parameter Value Interpretation
ω (omega) 0.9504 Baseline variance
α (alpha) 0.1270 ARCH effect — sensitivity to recent shocks
β (beta) 0.8146 GARCH effect — volatility persistence
α + β 0.9416 Very high persistence — shocks decay slowly

GARCH Diagnostics GARCH(1,1) diagnostics: standardised residuals (top-left), conditional volatility σₜ (top-right), ACF of residuals (bottom-left), ACF of squared residuals (bottom-right). Residual autocorrelation is effectively removed.

GARCH Predictions GARCH(1,1) rolling price predictions vs actual. The purple dashed line tracks the actual price with high fidelity — RMSE $1,735.

Markov-Switching AR (MS-AR)

Two-regime model with regime-dependent AR(4) coefficients and variance — explicitly captures Bitcoin's bull/bear market dynamics.

Regime 1 (Low Volatility) Regime 2 (High Volatility)
Mean Return 0.068% 0.080%
Std Dev 1.49% 5.44%

MS-AR Regimes Top: Returns with shaded regime classifications. Bottom: Filtered regime probabilities over time. The model cleanly separates calm accumulation phases (Regime 1, red) from turbulent rally/crash periods (Regime 2, green).

TAR/SETAR

Threshold AR model using the median return (0.063%) as the switching threshold, with separate AR(4) models fitted above and below it.

  • Regime 1 (below threshold): 876 observations
  • Regime 2 (above threshold): 875 observations

TAR Predictions TAR/SETAR rolling forecast. Performance nearly identical to GARCH — both capture local dynamics well through their nonlinear regime structures.

Nonlinear Model Results:

Model RMSE MAE MAPE
GARCH(1,1) $1,735.73 $1,215.08 1.96% 0.9894
MS-AR $1,032.95 $715.90 1.79% 0.9331
TAR/SETAR $1,735.65 $1,213.20 1.95% 0.9894

05 — Deep Learning (LSTM)

Architecture:

Input: (60 timesteps, 1 feature)    ← 60-day look-back window
  └─ LSTM(100, return_sequences=True)
  └─ Dropout(0.2)
  └─ LSTM(100, return_sequences=False)
  └─ Dropout(0.2)
  └─ Dense(50, activation='relu')
  └─ Dense(1)

Training configuration:

  • Normalisation: MinMaxScaler [0, 1]
  • Split: 70% train / 10% validation / 20% test
  • Optimiser: Adam | Loss: MSE
  • Callbacks: EarlyStopping (patience=10), ModelCheckpoint
  • Best epoch: 93 / 100

LSTM Training History Training and validation loss curves. The model converges within ~30 epochs; early stopping restores weights from epoch 93.

LSTM Full Predictions LSTM predictions across train, validation, and test splits.

LSTM Test Predictions LSTM test-set predictions vs actual prices (Oct 2023 – Dec 2024). While the general trend is captured, the model consistently undershoots during the aggressive 2024 bull run.

LSTM Results:

Split RMSE MAE MAPE
Train $1,235.64 $838.16 5.10% 0.9947
Validation $590.74 $393.96 1.39% 0.9075
Test $4,250.87 $3,338.50 4.88% 0.9331

The large train-test performance gap indicates overfitting — ~2,000 training sequences is well below what LSTM reliably requires.


06 — Comparative Analysis

Final Leaderboard

Rank Model RMSE ($) MAE ($) MAPE (%)
🥇 MS-AR 1,032.95 715.90 1.79% 0.9331
🥈 TAR/SETAR 1,735.65 1,213.20 1.95% 0.9894
🥉 GARCH(1,1) 1,735.73 1,215.08 1.96% 0.9894
4 Holt Linear Trend 1,740.74 1,216.17 1.96% 0.9894
5 ARIMA(2,1,2) 1,745.35 1,225.38 1.97% 0.9894
6 LSTM 4,250.87 3,338.50 4.88% 0.9331

Metrics Comparison Four-panel bar chart comparing all models on RMSE, MAE, MAPE, and R². Gold-bordered green bar = best in each metric. LSTM is a clear outlier on all error metrics.

All Predictions Comparison Overlay of all model predictions against actual prices across the shared test window.

Model Rankings Per-metric rankings (lower bar = better for RMSE/MAE/MAPE). MS-AR ranks #1 on three of four metrics. LSTM consistently ranks last.

Traditional vs Deep Learning

Category Avg RMSE Avg MAPE Avg R²
Traditional (5 models) $1,598 1.93% 0.978
LSTM $4,251 4.88% 0.933
Δ +$2,653 worse +2.95% worse

Key Findings

1. MS-AR is the best overall model By explicitly modelling two volatility regimes (σ₁=1.49% vs σ₂=5.44%), Markov-Switching AR captures Bitcoin's boom-bust dynamics that simpler models miss — yielding the lowest RMSE ($1,033) and MAPE (1.79%).

2. Rolling forecasts level the playing field Updating ARIMA daily with new data produces MAPE of 1.97% — nearly matching GARCH's 1.96%. The forecasting framework matters as much as model complexity.

3. More complexity ≠ better results with limited data LSTM underperforms every traditional model by a wide margin. ~2,000 sequences is well below what LSTM reliably requires; the literature suggests >10,000 samples for robust performance.

4. Volatility modelling adds value beyond price prediction GARCH's conditional volatility estimates (persistence α+β=0.94) provide actionable risk signals — making it the go-to choice for risk management applications even where point accuracy matches linear models.

5. Recommended model hierarchy

Best Overall:        MS-AR      → Lowest error, regime-aware
Best for Risk Mgmt:  GARCH(1,1) → Conditional volatility + price forecast
Best Interpretable:  ARIMA      → Fast, simple, fully explainable
Avoid (<5K samples): LSTM       → Overfits, insufficient training data

Limitations & Future Work

Current Limitations:

  • Price/volume data only — no sentiment, on-chain metrics, or macro indicators
  • One-step-ahead forecasting only (multi-step is significantly harder)
  • Test period (2023–2024) coincides with historically unusual volatility and a new ATH
  • LSTM constrained by dataset size (~2,000 observations)

Future Work:

  1. Add Twitter/Reddit sentiment and on-chain data (hash rate, active addresses, exchange flows)
  2. Ensemble methods combining GARCH + MS-AR + ARIMA predictions
  3. Transformer architectures (Temporal Fusion Transformer, Informer)
  4. Extend to multi-asset comparison: ETH, BNB, SOL
  5. Real-time prediction pipeline with automated backtesting

Requirements

pandas>=2.0
numpy>=1.24
matplotlib
seaborn
yfinance
statsmodels
scipy
scikit-learn
arch
tensorflow>=2.0
pip install -r requirements.txt

Run notebooks sequentially:

jupyter notebook notebooks/01_data_collection_eda.ipynb

Data sourced from Yahoo Finance. Analysis performed with Python 3.12, TensorFlow 2.20, statsmodels 0.14, arch 6.x.

About

Designed and compared linear, nonlinear, and deep learning time-series models for highly volatile financial data; conducted stationarity testing, residual diagnostics, and volatility analysis to guide model selection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors