Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion MaxText/configs/a3/llama_2_7b/16vm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,6 @@ export XLA_FLAGS="--xla_dump_to=$OUTPUT_PATH/$RUN_NAME/HLO_dumps/

# 16 nodes
python MaxText/train.py MaxText/configs/base.yml run_name=$RUN_NAME hardware=gpu \
steps=30 dcn_data_parallelism=16 ici_fsdp_parallelism=8 per_device_batch_size=6 max_target_length=4096 model_name=llama2-7b \
steps=30 dcn_data_parallelism=16 ici_fsdp_parallelism=8 per_device_batch_size=4 max_target_length=4096 model_name=llama2-7b \
enable_checkpointing=false attention=cudnn_flash_te remat_policy=minimal_flash use_iota_embed=true scan_layers=false \
dataset_type=synthetic async_checkpointing=false base_output_directory=gs://runner-maxtext-logs enable_profiler=true
28 changes: 28 additions & 0 deletions MaxText/configs/a3/llama_2_7b/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<!--
Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# High Performance Model Configs on A3 GPU
Expected performance results for Llama2-7B model running on A3 GPU:


### Llama2-7B

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove MFU, NV doesn't communicate in those units.

| Hardware | TFLOP/sec/chip |
| ---------------------- | ---------------- |
| 1x A3 (h100-80gb-8) | 492 |
| 2x A3 (h100-80gb-8) | 422 |
| 4x A3 (h100-80gb-8) | 407 |
| 8x A3 (h100-80gb-8) | 409 |
| 16x A3 (h100-80gb-8) | 375 |