Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/help.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ We've got some excellent tutorials to help you get the most out of Fast-LLM:

- [**Quick-Start Guide**](quick-start.md): Perfect for launching Fast-LLM on a single GPU machine. We walk you through running your first training job (either locally or on a cluster), and handling common issues.

- [**Cookbook**](recipes/train-llama-8b.md): Ready to go big? These recipes cover real-world scenarios like training big models from scratch, continuing training from a checkpoint, and more. This is where Fast-LLM really shows its power.
- [**Cookbook**](recipes/train.md): Ready to go big? These recipes cover real-world scenarios like training big models from scratch, continuing training from a checkpoint, and more. This is where Fast-LLM really shows its power.

---

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ hide:

Introducing **Fast-LLM**, the cutting-edge open-source library built for training large language models (LLMs) with **unmatched speed, scalability, and cost-efficiency**. Developed by [ServiceNow Research](https://www.servicenow.com/research/)'s Foundation Models Lab, Fast-LLM is engineered to meet the rigorous demands of professional AI researchers, AI/ML engineers, academic and industrial research institutions, and enterprise product development teams pushing the limits of generative AI. **Achieve groundbreaking research and high-stakes production goals faster with Fast-LLM.**

[Start your journey with Fast-LLM](quick-start.md) and explore the future of LLM training. Dive into [real-world use cases](recipes/train-llama-8b.md) to see how Fast-LLM can elevate your training workflows.
[Start your journey with Fast-LLM](quick-start.md) and explore the future of LLM training. Dive into [real-world use cases](recipes/train.md) to see how Fast-LLM can elevate your training workflows.

## Why Fast-LLM?

Expand Down
7 changes: 0 additions & 7 deletions docs/recipes/continue-training-llama-8b.md

This file was deleted.

153 changes: 153 additions & 0 deletions docs/recipes/continue-training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
---
title: Continual Pretraining of Llama 3.1 8B or Qwen 2.5 7B
---


In this guide, we provide step-by-step instructions to do continued pretraining on The Stack with Llama 3.1 8B or Qwen 2.5 7B models.

# Preliminary steps
- [Quick Start](../quick-start.md)
- [Data preparation](data-preparation.md)

# Download the Pretrained Model
Let's download the model first:
=== "Llama 3.1 8B"
```bash
git lfs install
git clone https://huggingface.co/meta-llama/Llama-3.1-8B ./fast-llm-tutorial/pretrained-model
```
=== "Qwen 2.5 7B"
```bash
git lfs install
git clone https://huggingface.co/Qwen/Qwen2.5-7B ./fast-llm-tutorial/pretrained-model
```

# Training
This is not much different from a pretraining config. We will:
- specify the the model checkpoint to load and its format. Fast-LLM will automatically infer the corresponding model architecture.
- adapt some of the training parameters for our needs.
- and that's it!
=== "Llama 3.1 8B"
```yaml
training:
train_iters: 100_000
logs:
interval: 10
validation:
iterations: 25
interval: 1000
checkpoint:
interval: 1000
keep: 5
test_iters: 0
export: # (1)!
format: llama
interval: 20_000
batch:
micro_batch_size: 2
sequence_length: 4096
batch_size: 256
data:
format: file
path: fast-llm-tutorial/dataset.json # (2)!
split: [99, 1, 0]
optimizer:
weight_decay: 0.1
beta_1: 0.9
beta_2: 0.95
learning_rate:
base: 1.0e-04 # (3)!
minimum: 1.0e-05
decay_style: cosine
decay_iterations: 100_000
warmup_iterations: 2000
pretrained: # (4)!
format: llama
path: fast-llm-tutorial/pretrained-model
model_weights: yes # (5)!
model:
base_model:
transformer:
use_flash_attention: yes
cross_entropy_impl: fused
multi_stage:
zero_stage: 2
distributed:
training_dtype: bf16
run:
experiment_dir: fast-llm-tutorial/Llama-3.1-8B-cpt
```
=== "Qwen 2.5 7B"
```yaml
training:
train_iters: 100_000
logs:
interval: 10
validation:
iterations: 25
interval: 1000
checkpoint:
interval: 1000
keep: 5
test_iters: 0
export: # (1)!
format: qwen2
interval: 20_000
batch:
micro_batch_size: 1
sequence_length: 8192
batch_size: 256
data:
format: file
path: fast-llm-tutorial/dataset.json # (2)!
split: [99, 1, 0]
optimizer:
weight_decay: 0.1
beta_1: 0.9
beta_2: 0.95
learning_rate:
base: 1.0e-04 # (3)!
minimum: 1.0e-05
decay_style: cosine
decay_iterations: 100_000
warmup_iterations: 2000
pretrained: # (4)!
format: qwen2
path: fast-llm-tutorial/pretrained-model
model_weights: yes # (5)!
model:
base_model:
transformer:
use_flash_attention: yes
cross_entropy_impl: fused
multi_stage:
zero_stage: 2
distributed:
training_dtype: bf16
run:
experiment_dir: fast-llm-tutorial/qwen-2.5-7B-cpt
```

1. A the model will be saved in Hugging Face format to `~/results` directory every 20,000 iterations.
2. Location of the dataset metadata file generated in Step 4.
3. The learning-rate can be used to trade-off between learning and forgetting. A higher learning-rate will learn quickly on our new dataset but will cause forgetting. A lower learning-rate will instead retain more of the pretrained model's knowledge, but will slow down adapting to the new domain.
4. Config of the pretrained model. We load the model downloaded from the repository earlier.
5. This tells Fast-LLM to load the weights of the pretrained model. If we wanted to use the model's configuration, but train from scratch, we could use the same config but set this to `no`.

# Checkpoint usage
Checkpoints will be saved regularly, and every 20k steps a checkpoint will be exported in the HF format.
You can use it in `transformers` as you would use the pretrained model, except this one should be stronger on programming languages!
=== "Llama 3.1 8B"
```python
from transformers import pipeline, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("fast-llm-tutorial/pretrained-model")
pipe = pipeline("text-generation", model="fast-llm-tutorial/Llama-3.1-8B-cpt/export/llama/20000/", tokenizer=tokenizer)
```
=== "Qwen 2.5 7B"
```python
from transformers import pipeline, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("fast-llm-tutorial/pretrained-model")
pipe = pipeline("text-generation", model="fast-llm-tutorial/qwen-2.5-7B-cpt/export/qwen2/20000/", tokenizer=tokenizer)
```
7 changes: 0 additions & 7 deletions docs/recipes/train-llama-8b.md

This file was deleted.

189 changes: 189 additions & 0 deletions docs/recipes/train.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
title: Training Llama 3.1 8B
---

Follow this guide to train a Llama-3.1 or Qwen 2.5 7B like model from scratch!


# Preliminary steps
- [Quick Start](../quick-start.md)
- [Data preparation](data-preparation.md)


# Training configuration
In this guide, we show you how to configure a model architecture and train a model from scratch.
Let's start from the following training configuration:
=== "Llama 3.1 8B"
```yaml
training:
train_iters: 100_000
logs:
interval: 10
validation:
iterations: 25
interval: 1000
checkpoint:
interval: 1000
keep: 5
test_iters: 0
export:
format: llama
interval: 20_000
batch:
micro_batch_size: 2
sequence_length: 4096
batch_size: 256
data:
format: file
path: fast-llm-tutorial/dataset/fast_llm_dataset.json
split: [99, 1, 0]
optimizer:
weight_decay: 0.1
beta_1: 0.9
beta_2: 0.95
learning_rate:
base: 6.0e-04
minimum: 6.0e-05
decay_style: cosine
decay_iterations: 100_000
warmup_iterations: 2000
model:
base_model:
cross_entropy_impl: fused
multi_stage:
zero_stage: 2
distributed:
training_dtype: bf16
run:
experiment_dir: fast-llm-tutorial/experiment
```
=== "Qwen 2.5 7B"
```yaml
training:
train_iters: 100_000
logs:
interval: 10
validation:
iterations: 25
interval: 1000
checkpoint:
interval: 1000
keep: 5
test_iters: 0
export:
format: qwen2
interval: 20_000
batch:
micro_batch_size: 1
sequence_length: 8192
batch_size: 256
data:
format: file
path: fast-llm-tutorial/dataset/fast_llm_dataset.json
split: [99, 1, 0]
optimizer:
weight_decay: 0.1
beta_1: 0.9
beta_2: 0.95
learning_rate:
base: 6.0e-04
minimum: 6.0e-05
decay_style: cosine
decay_iterations: 100_000
warmup_iterations: 2000
model:
base_model:
cross_entropy_impl: fused
multi_stage:
zero_stage: 2
distributed:
training_dtype: bf16
run:
experiment_dir: fast-llm-tutorial/experiment
```

This configuration will not work because it misses important arguments to define model architecture.
There are 2 ways of instantiating our a model.

We could use a pretrained model config. This step is similar to what is done in the [Quick Start guide](../quick-start.md).
First download the model configuration:
=== "Llama 3.1 8B"
```bash
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/meta-llama/Llama-3.1-8B ./fast-llm-tutorial/pretrained-model
```
=== "Qwen 2.5 7B"
```bash
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Qwen/Qwen2.5-7B ./fast-llm-tutorial/pretrained-model
```

By specifying a pretrained model from the HuggingFace hub, Fast-LLM automatically converts the config to load the model.
**Only the configuration is loaded, not the weights**, because of `model_weights: no`.
=== "Llama 3.1 8B"
```yaml
pretrained:
format: llama
path: fast-llm-tutorial/pretrained_model
model_weights: no
```
=== "Qwen 2.5 7B"
```yaml
pretrained:
format: qwen2
path: fast-llm-tutorial/pretrained_model
model_weights: no
```

Alternatively, we define the model architecture ourselves as follows:
=== "Llama 3.1 8B"
```yaml
model:
base_model:
tie_word_embeddings: false
use_position_embeddings: false
vocab_size: 128256
transformer:
activation_type: silu
add_linear_biases: false
ffn_hidden_size: 14336
gated: true
head_groups: 8
hidden_size: 4096 # (1)!
kv_channels: 128
normalization:
type: rms_norm
num_attention_heads: 32
num_layers: 32
rotary:
type: llama3
theta: 500_000
```
=== "Qwen 2.5 7B"
```yaml
model:
base_model:
tie_word_embeddings: false
use_position_embeddings: false
vocab_size: 152064
transformer:
activation_type: silu
add_linear_biases: only_attn_qkv
ffn_hidden_size: 18944
gated: true
head_groups: 4
hidden_size: 3584 # (1)!
normalization:
type: rms_norm
epsilon: 1e-06
num_attention_heads: 28
num_layers: 28
rotary:
type: default
theta: 1_000_000
```

1. Hidden-size/num-layers will be used to provide good defaults for weight initialization std.

Configuring the model this way is a bit more verbose than using the pretrained configuration, but gives an idea of how to configure a the model with Fast-LLM.

4 changes: 2 additions & 2 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,8 @@ nav:
- Recipes:
- Prepare a dataset: recipes/data-preparation.md
- Configure a dataset: recipes/data-configuration.md
- Train Llama 8B from scratch: recipes/train-llama-8b.md
- Continue training Llama 8B: recipes/continue-training-llama-8b.md
- Train a model from scratch: recipes/train.md
- Continue training a model: recipes/continue-training.md
- Upcycle Llama 3B to MoE: recipes/upcycle-llama-3b-to-moe.md
- Reference:
- User Guide:
Expand Down