Bitcoin Price Forecasting — Comparative Study (2019–2024)

A rigorous time-series analysis comparing 6 forecasting models across linear, nonlinear, and deep learning paradigms on 6 years of daily Bitcoin data.

Project Overview

This project performs a comprehensive comparative study of Bitcoin (BTC-USD) price forecasting using 2,191 daily observations from January 2019 to December 2024. Six distinct models are rigorously evaluated under a consistent rolling one-step-ahead forecasting framework, spanning traditional econometric approaches through modern deep learning.

Category	Models
Linear	ARIMA(2,1,2), Holt's Linear Trend
Nonlinear	GARCH(1,1), Markov-Switching AR (MS-AR), TAR/SETAR
Deep Learning	LSTM (2-layer, 100 units)

Repository Structure

bitcoin-forecasting/
├── data/
│   └── bitcoin_raw_data.csv              # Raw BTC-USD OHLCV from Yahoo Finance
├── models/
│   └── best_lstm_model.keras             # Saved best LSTM weights
├── notebooks/
│   ├── 01_data_collection_eda.ipynb      # Data download & exploratory analysis
│   ├── 02_statistical_tests.ipynb        # Stationarity, autocorrelation, ARCH tests
│   ├── 03_linear_models.ipynb            # ARIMA & Holt Linear Trend
│   ├── 04_nonlinear_models.ipynb         # GARCH, MS-AR, TAR/SETAR
│   ├── 05_deeplearning_models.ipynb      # LSTM + full model comparison
│   └── 06_comparitive_analysis.ipynb     # Final synthesis & visualizations
├── results/
│   ├── figures/                          # All generated plots
│   ├── all_models_final_results.csv      # Consolidated metrics table
│   ├── final_model_comparison.csv        # Ranked comparison
│   ├── linear_models_predictions.csv
│   ├── nonlinear_models_predictions.csv
│   ├── lstm_predictions.csv
│   └── project_summary.json
├── requirements.txt
└── README.md

Dataset

Source: Yahoo Finance via yfinance
Ticker: BTC-USD
Period: 2019-01-01 to 2024-12-30
Observations: 2,191 daily records, no missing values
Features: Open, High, Low, Close, Volume

Statistic	Value
Min Price	$3,399.47
Max Price	$106,140.60
Mean Price	$31,473.14
Std Dev	$22,079.72

Methodology

Train / Test Split

Train: 2019-01-01 → 2023-10-18 (80% — 1,752 observations)
Test: 2023-10-18 → 2024-12-30 (20% — 439 observations)
All models use rolling one-step-ahead forecasting — the model is retrained/updated at each step with the newest observation before predicting the next day.

Evaluation Metrics

Metric	Description
RMSE	Root Mean Squared Error (USD)
MAE	Mean Absolute Error (USD)
MAPE	Mean Absolute Percentage Error (%)
R²	Coefficient of Determination

Notebooks

01 — Data Collection & EDA

Downloads BTC-USD data from Yahoo Finance, validates quality (zero missing values), computes basic statistics, and produces the full price time-series visualization.

Bitcoin daily closing prices 2019–2024. Key events visible: COVID crash (Mar 2020), first ATH ~$65K (Nov 2021), bear market bottom ~$16K (Nov 2022), new ATH $106K (Dec 2024).

02 — Statistical Tests

Rigorous pre-modelling analysis to justify model choices:

Test	Series	Statistic	p-value	Result
ADF	Prices	-0.556	0.881	❌ Non-Stationary
ADF	Returns	-22.197	0.000	✅ Stationary
KPSS	Prices	3.990	0.010	❌ Non-Stationary
Ljung-Box	Prices	—	0.000	✅ Significant autocorrelation (20/20 lags)
McLeod-Li	Sq. Returns	—	<0.05	✅ ARCH effects (14/20 lags)
Jarque-Bera	Returns	30,907	0.000	❌ Non-Normal (fat tails, skew=-1.10, kurt=18.32)

Conclusion: Prices are I(1), returns are stationary. Strong autocorrelation and ARCH effects justify time-series and GARCH modelling. Fat tails justify nonlinear approaches.

ACF/PACF of prices (top) and returns (bottom). Prices show slowly decaying autocorrelation — a hallmark of non-stationarity. Returns are near white noise after differencing.

Returns (left) and ACF of squared returns (right). The significant spikes in the ACF of squared returns confirm ARCH effects — large price swings cluster together in time.

03 — Linear Models

ARIMA(2,1,2)

Best order selected via AIC grid search over p ∈ {0,1,2,3}, q ∈ {0,1,2,3}, d=1:

Order	AIC	BIC
(2,1,2) ✅	29,297.11	29,329.92
(2,1,3)	29,299.08	29,337.36
(3,1,3)	29,301.07	29,344.81

Model parameters (all significant at p<0.001): AR1=0.821, AR2=-0.954, MA1=-0.847, MA2=0.963.

ACF/PACF of differenced training data — used to inform ARIMA order selection.

Holt's Linear Trend

Exponential smoothing with adaptive trend, parameters optimised automatically via MLE each rolling step.

Holt's Linear Trend rolling forecast tracks the 2024 bull market closely, with slight lag at sharp inflection points.

Linear Model Results:

Model	RMSE	MAE	MAPE	R²
ARIMA(2,1,2)	$1,745.35	$1,225.38	1.97%	0.9894
Holt Linear Trend	$1,740.74	$1,216.17	1.96%	0.9894

04 — Nonlinear Models

GARCH(1,1)

Captures time-varying conditional volatility in log-returns with an AR(2) mean equation.

Estimated parameters:

Parameter	Value	Interpretation
ω (omega)	0.9504	Baseline variance
α (alpha)	0.1270	ARCH effect — sensitivity to recent shocks
β (beta)	0.8146	GARCH effect — volatility persistence
α + β	0.9416	Very high persistence — shocks decay slowly

GARCH(1,1) diagnostics: standardised residuals (top-left), conditional volatility σₜ (top-right), ACF of residuals (bottom-left), ACF of squared residuals (bottom-right). Residual autocorrelation is effectively removed.

GARCH(1,1) rolling price predictions vs actual. The purple dashed line tracks the actual price with high fidelity — RMSE $1,735.

Markov-Switching AR (MS-AR)

Two-regime model with regime-dependent AR(4) coefficients and variance — explicitly captures Bitcoin's bull/bear market dynamics.

	Regime 1 (Low Volatility)	Regime 2 (High Volatility)
Mean Return	0.068%	0.080%
Std Dev	1.49%	5.44%

Top: Returns with shaded regime classifications. Bottom: Filtered regime probabilities over time. The model cleanly separates calm accumulation phases (Regime 1, red) from turbulent rally/crash periods (Regime 2, green).

TAR/SETAR

Threshold AR model using the median return (0.063%) as the switching threshold, with separate AR(4) models fitted above and below it.

Regime 1 (below threshold): 876 observations
Regime 2 (above threshold): 875 observations

TAR/SETAR rolling forecast. Performance nearly identical to GARCH — both capture local dynamics well through their nonlinear regime structures.

Nonlinear Model Results:

Model	RMSE	MAE	MAPE	R²
GARCH(1,1)	$1,735.73	$1,215.08	1.96%	0.9894
MS-AR	$1,032.95	$715.90	1.79%	0.9331
TAR/SETAR	$1,735.65	$1,213.20	1.95%	0.9894

05 — Deep Learning (LSTM)

Architecture:

Input: (60 timesteps, 1 feature)    ← 60-day look-back window
  └─ LSTM(100, return_sequences=True)
  └─ Dropout(0.2)
  └─ LSTM(100, return_sequences=False)
  └─ Dropout(0.2)
  └─ Dense(50, activation='relu')
  └─ Dense(1)

Training configuration:

Normalisation: MinMaxScaler [0, 1]
Split: 70% train / 10% validation / 20% test
Optimiser: Adam | Loss: MSE
Callbacks: EarlyStopping (patience=10), ModelCheckpoint
Best epoch: 93 / 100

Training and validation loss curves. The model converges within ~30 epochs; early stopping restores weights from epoch 93.

LSTM predictions across train, validation, and test splits.

LSTM test-set predictions vs actual prices (Oct 2023 – Dec 2024). While the general trend is captured, the model consistently undershoots during the aggressive 2024 bull run.

LSTM Results:

Split	RMSE	MAE	MAPE	R²
Train	$1,235.64	$838.16	5.10%	0.9947
Validation	$590.74	$393.96	1.39%	0.9075
Test	$4,250.87	$3,338.50	4.88%	0.9331

The large train-test performance gap indicates overfitting — ~2,000 training sequences is well below what LSTM reliably requires.

06 — Comparative Analysis

Final Leaderboard

Rank	Model	RMSE ($)	MAE ($)	MAPE (%)	R²
🥇	MS-AR	1,032.95	715.90	1.79%	0.9331
🥈	TAR/SETAR	1,735.65	1,213.20	1.95%	0.9894
🥉	GARCH(1,1)	1,735.73	1,215.08	1.96%	0.9894
4	Holt Linear Trend	1,740.74	1,216.17	1.96%	0.9894
5	ARIMA(2,1,2)	1,745.35	1,225.38	1.97%	0.9894
6	LSTM	4,250.87	3,338.50	4.88%	0.9331

Four-panel bar chart comparing all models on RMSE, MAE, MAPE, and R². Gold-bordered green bar = best in each metric. LSTM is a clear outlier on all error metrics.

Overlay of all model predictions against actual prices across the shared test window.

Per-metric rankings (lower bar = better for RMSE/MAE/MAPE). MS-AR ranks #1 on three of four metrics. LSTM consistently ranks last.

Traditional vs Deep Learning

Category	Avg RMSE	Avg MAPE	Avg R²
Traditional (5 models)	$1,598	1.93%	0.978
LSTM	$4,251	4.88%	0.933
Δ	+$2,653 worse	+2.95% worse	—

Key Findings

1. MS-AR is the best overall model By explicitly modelling two volatility regimes (σ₁=1.49% vs σ₂=5.44%), Markov-Switching AR captures Bitcoin's boom-bust dynamics that simpler models miss — yielding the lowest RMSE ($1,033) and MAPE (1.79%).

2. Rolling forecasts level the playing field Updating ARIMA daily with new data produces MAPE of 1.97% — nearly matching GARCH's 1.96%. The forecasting framework matters as much as model complexity.

3. More complexity ≠ better results with limited data LSTM underperforms every traditional model by a wide margin. ~2,000 sequences is well below what LSTM reliably requires; the literature suggests >10,000 samples for robust performance.

4. Volatility modelling adds value beyond price prediction GARCH's conditional volatility estimates (persistence α+β=0.94) provide actionable risk signals — making it the go-to choice for risk management applications even where point accuracy matches linear models.

5. Recommended model hierarchy

Best Overall:        MS-AR      → Lowest error, regime-aware
Best for Risk Mgmt:  GARCH(1,1) → Conditional volatility + price forecast
Best Interpretable:  ARIMA      → Fast, simple, fully explainable
Avoid (<5K samples): LSTM       → Overfits, insufficient training data

Limitations & Future Work

Current Limitations:

Price/volume data only — no sentiment, on-chain metrics, or macro indicators
One-step-ahead forecasting only (multi-step is significantly harder)
Test period (2023–2024) coincides with historically unusual volatility and a new ATH
LSTM constrained by dataset size (~2,000 observations)

Future Work:

Add Twitter/Reddit sentiment and on-chain data (hash rate, active addresses, exchange flows)
Ensemble methods combining GARCH + MS-AR + ARIMA predictions
Transformer architectures (Temporal Fusion Transformer, Informer)
Extend to multi-asset comparison: ETH, BNB, SOL
Real-time prediction pipeline with automated backtesting

Requirements

pandas>=2.0
numpy>=1.24
matplotlib
seaborn
yfinance
statsmodels
scipy
scikit-learn
arch
tensorflow>=2.0

pip install -r requirements.txt

Run notebooks sequentially:

jupyter notebook notebooks/01_data_collection_eda.ipynb

Data sourced from Yahoo Finance. Analysis performed with Python 3.12, TensorFlow 2.20, statsmodels 0.14, arch 6.x.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bitcoin Price Forecasting — Comparative Study (2019–2024)

Table of Contents

Project Overview

Repository Structure

Dataset

Methodology

Train / Test Split

Evaluation Metrics

Notebooks

01 — Data Collection & EDA

02 — Statistical Tests

03 — Linear Models

ARIMA(2,1,2)

Holt's Linear Trend

04 — Nonlinear Models

GARCH(1,1)

Markov-Switching AR (MS-AR)

TAR/SETAR

05 — Deep Learning (LSTM)

06 — Comparative Analysis

Final Leaderboard

Traditional vs Deep Learning

Key Findings

Limitations & Future Work

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
models		models
notebooks		notebooks
results		results
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Bitcoin Price Forecasting — Comparative Study (2019–2024)

Table of Contents

Project Overview

Repository Structure

Dataset

Methodology

Train / Test Split

Evaluation Metrics

Notebooks

01 — Data Collection & EDA

02 — Statistical Tests

03 — Linear Models

ARIMA(2,1,2)

Holt's Linear Trend

04 — Nonlinear Models

GARCH(1,1)

Markov-Switching AR (MS-AR)

TAR/SETAR

05 — Deep Learning (LSTM)

06 — Comparative Analysis

Final Leaderboard

Traditional vs Deep Learning

Key Findings

Limitations & Future Work

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages