Skip to content

Qwen2VL using nearest neighbour downsampling by default #7125

@selflein

Description

@selflein

Reminder

  • I have read the above rules and searched the existing issues.

System Info

- `llamafactory` version: 0.9.2.dev0
- Platform: Linux-4.15.0-213-generic-x86_64-with-glibc2.27
- Python version: 3.11.9
- PyTorch version: 2.5.1
- Transformers version: 4.49.0
- Datasets version: 3.2.0
- Accelerate version: 0.34.0
- PEFT version: 0.12.0
- TRL version: 0.9.6
- DeepSpeed version: 0.15.4
- vLLM version: 0.7.3.dev0+g0408efc6.d20250208

Reproduction

- Train Qwen2VL model using any config and a dataset that contains images `> image_resolution`
- Images are resized using nearest neighbor interpolation  
  • here for Qwen2 specifically
  • and in the base class here

Others

Nearest neigbor downsampling incurs a large loss in image quality (small lines vanish, see the attached image). Qwen2VL generally uses BICUBIC downsampling instead to circumvent the this problem.

Image

Generally, it would be great if one could directly configure with image_resolution and use the transformers preprocessor that is called after anyway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions