Be sure to replace some of the command-line option values below with ones specific to your environment setup (e.g., instance type, availability zone, etc.).
mlbot init --project pl-mnist-example --docker-image <the docker image name to use for this project>mlbot run --instance-type p3dn.24xlarge --az us-west-2b --num-nodes 1 train.py --trainer.gpus 8 --trainer.num_nodes 1 --trainer.strategy 'ddp' --trainer.max_epochs 1000- Once the job is running, you can view live logs by running:
kubectl attach -n elastic-job <project id>-worker-0
This example is adapted from the PyTorch Lightning repo.