How to resume training from checkpoint?

I set the resume_from_checkpoint parameter in TrainingArguments.
And in the startup script, the checkpoint path is specified for resume_from_checkpoint.

```python
@dataclass
class TrainingArguments(transformers.TrainingArguments):
    cache_dir: Optional[str] = field(default=None)
    optim: str = field(default='adamw_torch')
    resume_from_checkpoint: Optional[str] = field(
        default=None, 
        metadata={
            'help': 'Path to a checkpoint directory to resume training from (e.g., `output/checkpoint-1000/`)'
        }
    )
    max_length: int = field(
        default=4096,
        metadata={
            'help':
            'Maximum sequence length. Sequences will be right padded (and possibly truncated).'
        },
    )
    use_lora: bool = False
    fix_vit: bool = True
    fix_sampler: bool = False
    fix_llm: bool = True
    label_names: List[str] = field(default_factory=lambda: ['samples'])
```
However, ChartMoETrainer will still start training from scratch.  
What Settings should I make to resume training from a breakpoint?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to resume training from checkpoint? #14

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to resume training from checkpoint? #14

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions