NOTE: This is the public variant of the JIT Workflow Execution repository. The public variant is created to wipe the change history. If access the private variant with history is desired, please contact the authors.
JIT Workflow Execution is an API-driven service designed to manage just-in-time execution of workflows. It enables users to initiate, monitor, and control workflows dynamically, with flexible integration into storage solutions and external services.
The project is containerized using Docker, and its continuous integration and deployment are automated through GitHub Actions. The system is designed for modularity and can be adapted or extended based on future requirements.
The repository is organized into key components to ensure a clean separation of concerns:
-
api/
Contains the core Flask API responsible for workflow execution, storage interactions (viarclone), and management of workflow lifecycles.
The API is the main interface for external clients to interact with the service. -
benchmark/
A suite of performance benchmarking tools and scripts. These are used to evaluate and test the scalability and responsiveness of the API and underlying systems. -
POC/
Proof-of-concept implementations and prototypes. This directory contains experimental features or early-stage ideas that may be integrated into the core project in the future. Currently only an agro-API POC is available, this is because most POC's are already fully integrated within the product and were deemed duplication's.
The Flask-based API is responsible for orchestrating workflow executions. Its main capabilities include:
- Triggering workflow executions on demand
- Monitoring and managing workflow status
- Handling input/output data transfer, including remote storage via
rclone
The current implementation combines multiple responsibilities into a single Flask API. For future improvements, the service can be split into two separate APIs:
- A Control API to manage workflows and metadata
- A Data API to handle file transfers and integration with remote storage (e.g.,
rclone)
This separation would improve maintainability and scalability as the system grows.
-
MongoDB Instance
The application is currently tightly integrated with MongoDB for its workflow state management and metadata storage. Developers are expected to configure and run a MongoDB instance. How this is set up is left to the external party and can be customized as needed. -
Python 3.x
-
Docker (optional for containerized runs)
-
rclone (optional for remote storage management)
The current setup is not easy as there is a tight integration with MongoDB hosted on AWS. This makes the code less portable. Our advice is to refactor this code or edit the code to match your MongoDB instance.
You can run the application either natively on your machine or by using the provided Dockerfile for a containerized environment. Below are both options:
-
Configure MongoDB
Ensure a MongoDB instance is running and accessible.
The application depends on MongoDB for storing workflow state and metadata.
It is up to the external party to configure this MongoDB instance according to their needs.
Update the connection details inconfig.pyor pass them as environment variables. -
Set up your Python environment
Create and activate a virtual environment, then install the dependencies:python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Run the Flask API locally
Start the development server:python3 -m flask run
-
(Optional) Configure and Use rclone
The system supports file synchronization and management throughrclone.
While it can be integrated directly into the Flask API workflow, it is also possible to runrcloneas a standalone service for better modularity and scalability.
Refer to the setup scripts located inlocal_setup/:setupRClone.sh– to configure rclone for your remote storagecleanRClone.sh– to remove or reset rclone configurations
To simplify deployment and ensure consistency across environments, you can build and run the API service using Docker.
-
Build the Docker image
Run the following command in the root of theapi/directory where theDockerfileis located:docker build -t jit-workflow-api . -
Run the Docker container
Start the container, making sure to provide the necessary environment variables for MongoDB and any other configuration:docker run -d \ --name jit-workflow-api \ -p 5000:5000 \ -e MONGO_URI="mongodb://<your-mongo-host>:<port>/<db>" \ jit-workflow-api -
(Optional) Use docker-compose for multi-service setup
If you want to manage MongoDB, the API, and potentially the rclone service in a single setup, consider writing adocker-compose.ymlfile.
This makes managing multiple containers easier, especially for local development or testing environments.
-
MongoDB
The Docker container assumes you are either connecting to an external MongoDB instance or running MongoDB in another container. Make sure the connection details are correctly provided. -
rclone Manager
You can runrclonedirectly in the container as part of the Flask API or separately as a dedicated container or service.
For standalone deployment, use the providedsetupRClone.shandcleanRClone.shscripts inlocal_setup/.
This Docker-based approach simplifies running the service locally without needing to manually install Python packages and manage dependencies.
The repository includes a GitHub Actions workflow for automated testing and deployment.
Key steps in the CI/CD pipeline include:
- Running unit and integration tests
- Building Docker images
- Pushing images to the configured container registry
- Deploying to staging or production environments (as applicable)
The following secrets must be configured in GitHub before running deployments:
DOCKERHUB_USERNAMEDOCKERHUB_TOKENDEPLOYMENT_PRIVATE_KEY(for server access if needed)RCLONE_CONFIG(if applicable for remote storage integration)
Secrets can be configured in GitHub under:
Repository Settings > Secrets and variables > Actions
- Sensitive configuration files (e.g.,
replication_settings.json.enc) are encrypted and must be securely managed. - Private keys and encrypted credentials are stored as GitHub Actions secrets.
- No sensitive data or private keys should be committed to the repository.
-
API Splitting
For future scalability and clearer separation of concerns, consider splitting the current Flask API into two distinct services:- Control API – manages workflow lifecycle and metadata
- Data API – handles file transfer operations and remote storage integration (via
rclone)
-
MongoDB Abstraction
Consider abstracting the database layer in future releases to support alternative backends beyond MongoDB. -
Enhanced rclone Management
Operating the rclone manager as a separate microservice can offer improved performance, maintainability, and scalability.
This project is licensed under the terms of the LICENSE file located at the root of this repository.