An end-to-end scientific ML job orchestration platform developed as part of my bachelor’s thesis and Alisher Layik’s bachelor’s thesis in CTU FIT (ČVUT FIT), designed to ingest, preprocess, analyze, and manage large volumes of astronomical spectra (LAMOST FITS files) through human-in-the-loop machine learning workflows.
The core goal of this system is to streamline and automate the full lifecycle of spectroscopic data processing, from raw file ingestion to active-learning–driven model refinement, while providing a unified API and web UI for monitoring, controlling, and labeling ML jobs.
In particular, this platform addresses the following needs and research objectives:
-
💾 High-Throughput FITS Ingestion & Preprocessing
- LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope)
releases millions of raw
FITSfiles containing stellar spectra. Before any classification or analysis can occur, these spectra must be read, normalized, and interpolated onto a uniform wavelength grid. The “Data Preprocessing” pipeline in the system handles:- Reading raw
FITSheaders and data arrays with Astropy. - Interpolating flux values across a common wavelength range (e.g., 3800 Å to 9000 Å) with Scikit-Learn.
- Min–max scaling of flux measurements with Scikit-Learn.
- Writing the preprocessed spectra into a single, consolidated
HDF5file for downstream tasks with h5py.
- Reading raw
- LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope)
releases millions of raw
-
💠 Active Learning for Spectral Classification
- Even with large-scale labeled datasets, manual labeling of rare or ambiguous spectral classes remains
labor-intensive. The “Active ML” pipeline:
- Trains a 1D-CNN (built in TensorFlow & Keras) on any existing labeled spectra.
- Uses uncertainty metrics (entropy of softmax outputs) to identify spectra that should be sent to an expert “oracle” for manual labeling.
- Computes performance est. sets and candidate sets based on user-defined classes.
- Iteratively refines the training corpus by integrating newly labeled spectra, retraining, and selecting the next batch for expert review.
- Outputs intermediate artifacts (
HDF5,JSON) for visualization and for use by the frontend with h5py.
- Even with large-scale labeled datasets, manual labeling of rare or ambiguous spectral classes remains
labor-intensive. The “Active ML” pipeline:
-
👩🏼💻 Human-in-the-Loop Labelling Workflow
- To ensure high classification accuracy on edge‐case spectra (e.g., double-peak emission lines), the system supports:
- Automatic batch initialization of labeling jobs (via an HTTPX callback to the back-end API).
- A React-based web UI to display selected spectra (with flux vs. wavelength plots) and collect “oracle” labels.
- Tracking of labeling status and iteration counts so that domain experts can focus on the spectra most beneficial for model improvement.
- To ensure high classification accuracy on edge‐case spectra (e.g., double-peak emission lines), the system supports:
-
🖥️ API & Web UI for Job Management
- Researcher user needs a centralized way to:
- Submit new data preprocessing or active-learning jobs (via REST endpoints) with FastAPI and React.
- Monitor job statuses, timestamps, and logs in real time.
- Browse raw and preprocessed spectra metadata, view plots, and download files with FastAPI, aiofiles, Astropy, and Plotly.
- Researcher user needs a centralized way to:
-
🪛 Scalable, Containerized DevOps Stack
- To simplify deployment and reproducibility across environments:
- The entire platform is containerized with Docker: PostgreSQL for metadata persistence, RabbitMQ as the job message broker, FastAPI for backend, Celery workers for computations, and React for frontend.
- A single
docker-compose.ymlfile spins up all services with a one-line command. - Environment‐driven configuration (via
.envfiles and Pydantic settings) allows seamless switching between local development, testing, and production clusters.
- To simplify deployment and reproducibility across environments:
By integrating these components, ML Job Manager provides a robust research prototype and platform implementation for anyone working on large‐scale astronomical spectroscopy, active learning in scientific contexts, or end-to-end ML workflow orchestration skeleton/basement for another scientific projects.
Monorepo containing 3 components for an end-to-end ML job orchestration platform:
ml-job-manager/
├── images/
├── ml-job-api/ ← ML Job API FastAPI microservice (REST API)
├── ml-job-ui/ ← ML Job UI React frontend (Web Interface + Spectra Visualizations)
├── ml-job-worker/ ← ML Job Worker Celery Worker (Data Preprocessing & Active ML)
├── docker-compose.yml
├── LICENSE
└── README.md
-
ML Job API:
– CRUD endpoints for jobs, labellings, spectra, file storage.
– Async
PostgreSQLpersistence,Alembicmigrations.–
Celeryintegration for dispatching jobs. -
ML Job Worker:
–
Celeryjobs: Data Preprocessing & Active ML pipelines.–
TensorFlowCNN,Scikit-Learnutilities (SMOTE, t-SNE).–
HTTPXcallbacks to ML Job API. -
ML Job UI:
–
React+TailwindWeb frontend.– Live job status, spectra view, labelling workflow.
-
DevOps:
–
Docker&Docker Composefor full-stack local development.– Environment-driven configuration via
.envand Pydantic.
-
Docker&Docker Compose≥ v2.0. -
Nvidia GPU for ML Job Worker
TensorFlowCNN computations.
Clone the repo:
git clone https://github.com/bursasha/ml-job-manager.git
cd ml-job-managerCreate a .env in the project root (see .env.example for all keys):
DEBUG=True
FILES_DIR_PATH=...
SPECTRA_DIR_PATH=...
JOB_QUEUE=jobs
#
UI_PORT=10000
#
API_PORT=10100
ENGINE_CONNECTION_TIMEOUT=3
#
WORKER_PORT=10200
BROKER_CONNECTION_TIMEOUT=3
API_CONNECTION_TIMEOUT=3
#
POSTGRES_USER=...
POSTGRES_PASSWORD=...
POSTGRES_DB=...
#
RABBITMQ_MANAGEMENT_PORT=10300
RABBITMQ_DEFAULT_USER=...
RABBITMQ_DEFAULT_PASS=...Bring up the entire stack:
docker compose upThis will launch following services:
-
ML Job UI (
React) -
ML Job API (
FastAPI) -
ML Job Worker (
Celery) -
ML Job Queue (
RabbitMQ) -
ML Job DB (
PostgreSQL)
You can now:
-
Visit the UI at http://localhost:10000
-
Browse API docs at http://localhost:10100/docs
Stop:
docker compose stopRemove:
docker compose downdocker compose logs -f ml-job-ui
docker compose logs -f ml-job-api
docker compose logs -f ml-job-worker
docker compose logs -f ml-job-queue
docker compose logs -f ml-job-dbThis work is made available under the terms of a non‐exclusive authorization per Act No. 121/2000 Coll., Copyright Act, and Section 2373(2) of Act No. 89/2012 Coll., Civil Code (see ML Job Manager LICENSE for full text).
