Conversation
docs/user_guide/batch_size_tuning.md
Outdated
| @@ -0,0 +1,10 @@ | |||
| To maximize efficiency, Ludwig performs automatic batch size tuning when the `batch_size` parameter is noet in the configuration in order to best saturate the GPU. Batch size tuning does not occur during CPU training due to the lack of effective parallelization, and Ludwig instead sets the batch size to a fixed value. | |||
There was a problem hiding this comment.
Suggestion for a minor rewrite:
"In Ludwig, users have the option to set batch_size to a fixed value as part of the training config.
trainer:
batch_size: 128If the batch size is unspecified Ludwig sets batch_size=auto.
trainer:
batch_size: autoauto enables Ludwig to select an efficient batch size automatically. The actual value of the batch size can be found in training logs and in the model output directory.
Batch size tuning is supported in single-node and multi-node CPU and GPU settings.
ECD Models
Batch size tuning for ECD models follows this procedure, starting from batch size 1:
- Perform a small number of forward passes through the model using a sample from the dataset and observe whether the model hits a memory error and the overall throughput speed (examples/sec).
- If the model hits a memory error or if throughput decreases, then use the last valid batch size. Otherwise, double the batch size and repeat step 1.
LLMs
The main element that separates LLM batch size tuning from its ECD counterpart is the sequence length. LLM's thus undergo the same batch size tuning process as ECD models with the exception being that, instead of using a random sample from the dataset, the forward passes use a synthetic data sample with a sequence length equal to specified max sequence length (or the longest sequence length in the provided dataset if max sequence length is unspecified)."
Add docs for clarity on batch size tuning and the differences between ECD and LLM batch size tuning. DO NOT MERGE YET. Waiting on LLM batch size tuning PR to land.