Predict EPL Match Winner

An end-to-end ML project that downloads ~20 years of EPL results, builds features, trains and evaluates models locally, and serves predictions via a Streamlit UI. All operations are local-only (no Vertex AI / GCS dependencies).

Tech

Python 3.12.3, uv
Streamlit UI for local inference
Tooling: black, ruff (lint + import sorting), mypy, pytest + coverage, pre-commit & pre-push

Quick start

# 0) env
uv venv --python 3.12.3
uv sync  # installs main + dev dependencies
uv pip install -e ".[dev]"  # optional if you want editable install (uv sync already handled deps)
pre-commit install
pre-commit install --hook-type pre-push

# 1) data (last 20 seasons by default)
uv run pemw download-data --seasons 20

# 2) features + local training (note CLI commands use kebab-case now). Feature set includes:
#    - Elo ratings + expectation
#    - Rolling form/goal diff (5 & 10)
#    - Implied probabilities from odds
#    - Rest days & elo/market interaction
uv run pemw prepare-data
uv run pemw train-local
uv run pemw evaluate-local

# Use gradient boosting model (may improve accuracy):
uv run pemw train-local --model-type hgb --min-team-freq 5
uv run pemw evaluate-local --model-type hgb --min-team-freq 5
# Calibrated logistic regression probabilities
uv run pemw train-local --calibrate

# (Optional) quick hyperparameter search for LogisticRegression C
uv run pemw tune-logreg
# (Optional) quick hyperparameter search for HistGradientBoosting
uv run pemw tune-hgb

# 3) Auto-select best model
#    Selection criteria: macro F1 (primary) with overall accuracy as a tie-breaker.
#    Both logistic regression and HistGradientBoosting are cross-validated via TimeSeriesSplit.
uv run pemw auto-select

# 4) Streamlit UI (local)
streamlit run src/pemw/ui/app.py

## Notes
- Import sorting is handled by Ruff's `I` rules; a separate isort step is no longer needed.
- If you previously had `isort` installed, you can remove it safely (`uv remove isort`).
- Model improvements implemented:
	- Added imputers (median / most_frequent) to handle NaNs in odds.
	- Dropped always-NaN numeric columns automatically.
	- Engineered additional features (rolling 10, implied probs, rest days, interaction).
	- Added class_weight=balanced and max_iter=1000.
	- Added simple time-series aware C tuning command `tune-logreg`.
	- Added HGB model option + tuning (`tune-hgb`), rare-team pruning, calibration flag, and automatic model selection (`auto-select`) using macro F1 then accuracy.
	- Removed cloud dependencies for a fully local workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
coverage-badge.svg		coverage-badge.svg
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predict EPL Match Winner

Tech

Quick start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predict EPL Match Winner

Tech

Quick start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages