A rigorous time-series analysis comparing 6 forecasting models across linear, nonlinear, and deep learning paradigms on 6 years of daily Bitcoin data.
- Project Overview
- Repository Structure
- Dataset
- Methodology
- Notebooks
- Results
- Key Findings
- Limitations & Future Work
- Requirements
This project performs a comprehensive comparative study of Bitcoin (BTC-USD) price forecasting using 2,191 daily observations from January 2019 to December 2024. Six distinct models are rigorously evaluated under a consistent rolling one-step-ahead forecasting framework, spanning traditional econometric approaches through modern deep learning.
| Category | Models |
|---|---|
| Linear | ARIMA(2,1,2), Holt's Linear Trend |
| Nonlinear | GARCH(1,1), Markov-Switching AR (MS-AR), TAR/SETAR |
| Deep Learning | LSTM (2-layer, 100 units) |
bitcoin-forecasting/
├── data/
│ └── bitcoin_raw_data.csv # Raw BTC-USD OHLCV from Yahoo Finance
├── models/
│ └── best_lstm_model.keras # Saved best LSTM weights
├── notebooks/
│ ├── 01_data_collection_eda.ipynb # Data download & exploratory analysis
│ ├── 02_statistical_tests.ipynb # Stationarity, autocorrelation, ARCH tests
│ ├── 03_linear_models.ipynb # ARIMA & Holt Linear Trend
│ ├── 04_nonlinear_models.ipynb # GARCH, MS-AR, TAR/SETAR
│ ├── 05_deeplearning_models.ipynb # LSTM + full model comparison
│ └── 06_comparitive_analysis.ipynb # Final synthesis & visualizations
├── results/
│ ├── figures/ # All generated plots
│ ├── all_models_final_results.csv # Consolidated metrics table
│ ├── final_model_comparison.csv # Ranked comparison
│ ├── linear_models_predictions.csv
│ ├── nonlinear_models_predictions.csv
│ ├── lstm_predictions.csv
│ └── project_summary.json
├── requirements.txt
└── README.md
- Source: Yahoo Finance via
yfinance - Ticker: BTC-USD
- Period: 2019-01-01 to 2024-12-30
- Observations: 2,191 daily records, no missing values
- Features: Open, High, Low, Close, Volume
| Statistic | Value |
|---|---|
| Min Price | $3,399.47 |
| Max Price | $106,140.60 |
| Mean Price | $31,473.14 |
| Std Dev | $22,079.72 |
- Train: 2019-01-01 → 2023-10-18 (80% — 1,752 observations)
- Test: 2023-10-18 → 2024-12-30 (20% — 439 observations)
- All models use rolling one-step-ahead forecasting — the model is retrained/updated at each step with the newest observation before predicting the next day.
| Metric | Description |
|---|---|
| RMSE | Root Mean Squared Error (USD) |
| MAE | Mean Absolute Error (USD) |
| MAPE | Mean Absolute Percentage Error (%) |
| R² | Coefficient of Determination |
Downloads BTC-USD data from Yahoo Finance, validates quality (zero missing values), computes basic statistics, and produces the full price time-series visualization.
Bitcoin daily closing prices 2019–2024. Key events visible: COVID crash (Mar 2020), first ATH ~$65K (Nov 2021), bear market bottom ~$16K (Nov 2022), new ATH $106K (Dec 2024).
Rigorous pre-modelling analysis to justify model choices:
| Test | Series | Statistic | p-value | Result |
|---|---|---|---|---|
| ADF | Prices | -0.556 | 0.881 | ❌ Non-Stationary |
| ADF | Returns | -22.197 | 0.000 | ✅ Stationary |
| KPSS | Prices | 3.990 | 0.010 | ❌ Non-Stationary |
| Ljung-Box | Prices | — | 0.000 | ✅ Significant autocorrelation (20/20 lags) |
| McLeod-Li | Sq. Returns | — | <0.05 | ✅ ARCH effects (14/20 lags) |
| Jarque-Bera | Returns | 30,907 | 0.000 | ❌ Non-Normal (fat tails, skew=-1.10, kurt=18.32) |
Conclusion: Prices are I(1), returns are stationary. Strong autocorrelation and ARCH effects justify time-series and GARCH modelling. Fat tails justify nonlinear approaches.
ACF/PACF of prices (top) and returns (bottom). Prices show slowly decaying autocorrelation — a hallmark of non-stationarity. Returns are near white noise after differencing.
Returns (left) and ACF of squared returns (right). The significant spikes in the ACF of squared returns confirm ARCH effects — large price swings cluster together in time.
Best order selected via AIC grid search over p ∈ {0,1,2,3}, q ∈ {0,1,2,3}, d=1:
| Order | AIC | BIC |
|---|---|---|
| (2,1,2) ✅ | 29,297.11 | 29,329.92 |
| (2,1,3) | 29,299.08 | 29,337.36 |
| (3,1,3) | 29,301.07 | 29,344.81 |
Model parameters (all significant at p<0.001): AR1=0.821, AR2=-0.954, MA1=-0.847, MA2=0.963.
ACF/PACF of differenced training data — used to inform ARIMA order selection.
Exponential smoothing with adaptive trend, parameters optimised automatically via MLE each rolling step.
Holt's Linear Trend rolling forecast tracks the 2024 bull market closely, with slight lag at sharp inflection points.
Linear Model Results:
| Model | RMSE | MAE | MAPE | R² |
|---|---|---|---|---|
| ARIMA(2,1,2) | $1,745.35 | $1,225.38 | 1.97% | 0.9894 |
| Holt Linear Trend | $1,740.74 | $1,216.17 | 1.96% | 0.9894 |
Captures time-varying conditional volatility in log-returns with an AR(2) mean equation.
Estimated parameters:
| Parameter | Value | Interpretation |
|---|---|---|
| ω (omega) | 0.9504 | Baseline variance |
| α (alpha) | 0.1270 | ARCH effect — sensitivity to recent shocks |
| β (beta) | 0.8146 | GARCH effect — volatility persistence |
| α + β | 0.9416 | Very high persistence — shocks decay slowly |
GARCH(1,1) diagnostics: standardised residuals (top-left), conditional volatility σₜ (top-right), ACF of residuals (bottom-left), ACF of squared residuals (bottom-right). Residual autocorrelation is effectively removed.
GARCH(1,1) rolling price predictions vs actual. The purple dashed line tracks the actual price with high fidelity — RMSE $1,735.
Two-regime model with regime-dependent AR(4) coefficients and variance — explicitly captures Bitcoin's bull/bear market dynamics.
| Regime 1 (Low Volatility) | Regime 2 (High Volatility) | |
|---|---|---|
| Mean Return | 0.068% | 0.080% |
| Std Dev | 1.49% | 5.44% |
Top: Returns with shaded regime classifications. Bottom: Filtered regime probabilities over time. The model cleanly separates calm accumulation phases (Regime 1, red) from turbulent rally/crash periods (Regime 2, green).
Threshold AR model using the median return (0.063%) as the switching threshold, with separate AR(4) models fitted above and below it.
- Regime 1 (below threshold): 876 observations
- Regime 2 (above threshold): 875 observations
TAR/SETAR rolling forecast. Performance nearly identical to GARCH — both capture local dynamics well through their nonlinear regime structures.
Nonlinear Model Results:
| Model | RMSE | MAE | MAPE | R² |
|---|---|---|---|---|
| GARCH(1,1) | $1,735.73 | $1,215.08 | 1.96% | 0.9894 |
| MS-AR | $1,032.95 | $715.90 | 1.79% | 0.9331 |
| TAR/SETAR | $1,735.65 | $1,213.20 | 1.95% | 0.9894 |
Architecture:
Input: (60 timesteps, 1 feature) ← 60-day look-back window
└─ LSTM(100, return_sequences=True)
└─ Dropout(0.2)
└─ LSTM(100, return_sequences=False)
└─ Dropout(0.2)
└─ Dense(50, activation='relu')
└─ Dense(1)
Training configuration:
- Normalisation: MinMaxScaler [0, 1]
- Split: 70% train / 10% validation / 20% test
- Optimiser: Adam | Loss: MSE
- Callbacks: EarlyStopping (patience=10), ModelCheckpoint
- Best epoch: 93 / 100
Training and validation loss curves. The model converges within ~30 epochs; early stopping restores weights from epoch 93.
LSTM predictions across train, validation, and test splits.
LSTM test-set predictions vs actual prices (Oct 2023 – Dec 2024). While the general trend is captured, the model consistently undershoots during the aggressive 2024 bull run.
LSTM Results:
| Split | RMSE | MAE | MAPE | R² |
|---|---|---|---|---|
| Train | $1,235.64 | $838.16 | 5.10% | 0.9947 |
| Validation | $590.74 | $393.96 | 1.39% | 0.9075 |
| Test | $4,250.87 | $3,338.50 | 4.88% | 0.9331 |
The large train-test performance gap indicates overfitting — ~2,000 training sequences is well below what LSTM reliably requires.
| Rank | Model | RMSE ($) | MAE ($) | MAPE (%) | R² |
|---|---|---|---|---|---|
| 🥇 | MS-AR | 1,032.95 | 715.90 | 1.79% | 0.9331 |
| 🥈 | TAR/SETAR | 1,735.65 | 1,213.20 | 1.95% | 0.9894 |
| 🥉 | GARCH(1,1) | 1,735.73 | 1,215.08 | 1.96% | 0.9894 |
| 4 | Holt Linear Trend | 1,740.74 | 1,216.17 | 1.96% | 0.9894 |
| 5 | ARIMA(2,1,2) | 1,745.35 | 1,225.38 | 1.97% | 0.9894 |
| 6 | LSTM | 4,250.87 | 3,338.50 | 4.88% | 0.9331 |
Four-panel bar chart comparing all models on RMSE, MAE, MAPE, and R². Gold-bordered green bar = best in each metric. LSTM is a clear outlier on all error metrics.
Overlay of all model predictions against actual prices across the shared test window.
Per-metric rankings (lower bar = better for RMSE/MAE/MAPE). MS-AR ranks #1 on three of four metrics. LSTM consistently ranks last.
| Category | Avg RMSE | Avg MAPE | Avg R² |
|---|---|---|---|
| Traditional (5 models) | $1,598 | 1.93% | 0.978 |
| LSTM | $4,251 | 4.88% | 0.933 |
| Δ | +$2,653 worse | +2.95% worse | — |
1. MS-AR is the best overall model By explicitly modelling two volatility regimes (σ₁=1.49% vs σ₂=5.44%), Markov-Switching AR captures Bitcoin's boom-bust dynamics that simpler models miss — yielding the lowest RMSE ($1,033) and MAPE (1.79%).
2. Rolling forecasts level the playing field Updating ARIMA daily with new data produces MAPE of 1.97% — nearly matching GARCH's 1.96%. The forecasting framework matters as much as model complexity.
3. More complexity ≠ better results with limited data LSTM underperforms every traditional model by a wide margin. ~2,000 sequences is well below what LSTM reliably requires; the literature suggests >10,000 samples for robust performance.
4. Volatility modelling adds value beyond price prediction GARCH's conditional volatility estimates (persistence α+β=0.94) provide actionable risk signals — making it the go-to choice for risk management applications even where point accuracy matches linear models.
5. Recommended model hierarchy
Best Overall: MS-AR → Lowest error, regime-aware
Best for Risk Mgmt: GARCH(1,1) → Conditional volatility + price forecast
Best Interpretable: ARIMA → Fast, simple, fully explainable
Avoid (<5K samples): LSTM → Overfits, insufficient training data
Current Limitations:
- Price/volume data only — no sentiment, on-chain metrics, or macro indicators
- One-step-ahead forecasting only (multi-step is significantly harder)
- Test period (2023–2024) coincides with historically unusual volatility and a new ATH
- LSTM constrained by dataset size (~2,000 observations)
Future Work:
- Add Twitter/Reddit sentiment and on-chain data (hash rate, active addresses, exchange flows)
- Ensemble methods combining GARCH + MS-AR + ARIMA predictions
- Transformer architectures (Temporal Fusion Transformer, Informer)
- Extend to multi-asset comparison: ETH, BNB, SOL
- Real-time prediction pipeline with automated backtesting
pandas>=2.0
numpy>=1.24
matplotlib
seaborn
yfinance
statsmodels
scipy
scikit-learn
arch
tensorflow>=2.0
pip install -r requirements.txtRun notebooks sequentially:
jupyter notebook notebooks/01_data_collection_eda.ipynbData sourced from Yahoo Finance. Analysis performed with Python 3.12, TensorFlow 2.20, statsmodels 0.14, arch 6.x.