This project involves various time series analysis methods applied to Bitcoin active wallet data. It includes data preprocessing, feature analysis, factor filtering, and model forecasting using simple benchmark, SARIMA, ARMA,LSTM models.
In this phase, we perform an analysis of Bitcoin price returns and factor filtering using decomposition methods such as STL, ACF, and PACF.
- Run
Return_analysis.pyto analyze return data for BTC price. - Run
Feature_analysis_and_factor_filtering.pyfor factor filtering and plot graphs of STL decomposition, ACF, and PACF.
==============================================
This notebook implements and evaluates three basic time series forecasting methods:
- Mean Method
- Naive Method
- Seasonal Naive Method
- Drift Method
- Bitcoin active wallet data (3-6 months activity)
- Daily aggregated from 10-minute data
- Period: 2021-01-01 onwards
- RMSE (Root Mean Square Error)
- MAE (Mean Absolute Error)
- MAPE (Mean Absolute Percentage Error)
- MASE (Mean Absolute Scaled Error)
- pandas
- numpy
- matplotlib
- sklearn
Run cells sequentially to see the performance comparison of different benchmark models.
This project uses the SARIMA model to perform time series analysis and seasonal pattern identification on BTC hourly relative profit data.
- Data preprocessing and resampling
- Time series visualization analysis
- Stationarity tests (ADF and KPSS tests)
- Seasonal pattern identification
- SARIMA model fitting
- Original time series plot
- Data distribution histogram + KDE
- Autocorrelation Function (ACF) plot
- Partial Autocorrelation Function (PACF) plot
- ADF test (Null hypothesis: Non-stationary)
- KPSS test (Null hypothesis: Stationary)
- Comprehensive stationarity judgment
Using pmdarima and statsmodels libraries:
- Automatic SARIMA parameter selection
- Seasonal pattern identification
- Model training and validation
This project implements an ARIMA-based forecasting pipeline using a daily time series.
-
Train–Test Split
The dataset is split 80/20 into an initial training set and a test set. -
Model Selection (Grid Search)
Since ARIMA requires stationarity, differencing is fixed at d = 1.
We search over:- p = 0–5
- q = 0–5
The combination with the lowest AIC on the initial training set is selected as the final model order.
-
Rolling (Expanding) Forecasting
Using the selected (p,1,q), we perform one-step-ahead forecasting:- Fit the model on the current training window
- Predict the next data point
- Append the true value into the training set
- Repeat until all test observations are predicted
-
Evaluation Metrics
We compute:- RMSE
- MAE
- MAPE
- MASE
- Files
- Notebook:
LSTM.ipynb,LSTM_stop.ipynb,LSTM_Hyper.ipynb - Data:
./data/BTC factors/addresses/BTC_1h_profit_relative.csv
Dataset → Supervised
- Sliding window (window_size = k): use k previous days to predict next day.
- LSTM input shape: (batch, seq_len, 1).
- Model
- Single-layer LSTM + linear output.
- Hidden units: 64.
- Loss:
SmoothL1Loss. - Optimizer: Adam (lr=0.01).
- Training: full-batch example (epochs in notebook).
- Inference & Evaluation
- Inverse-transform predictions to original scale.
- Metrics: RMSE, MAE, MAPE, MASE.
- Visualization: Train True / Train Pred / Test True / Test Pred plotted on same axis.
-
Usage
- Execute the
LSTM_Hyper.ipynbto find the best hyperparameters for the LSTM. - Then open
LSTM.ipynbor run theLSTM_stop.ipynbwhich we added the early stopping criteria to ensure the model not overfitting. - Ensure data path is available.
- Run cells sequentially.
- Execute the
-
Notes
- Scaler is fitted only on training data to avoid data leakage.
- For faster experiments reduce epochs or use mini-batches.