Reminder
System Info
- `llamafactory` version: 0.9.2.dev0
- Platform: Linux-4.15.0-213-generic-x86_64-with-glibc2.27
- Python version: 3.11.9
- PyTorch version: 2.5.1
- Transformers version: 4.49.0
- Datasets version: 3.2.0
- Accelerate version: 0.34.0
- PEFT version: 0.12.0
- TRL version: 0.9.6
- DeepSpeed version: 0.15.4
- vLLM version: 0.7.3.dev0+g0408efc6.d20250208
Reproduction
- Train Qwen2VL model using any config and a dataset that contains images `> image_resolution`
- Images are resized using nearest neighbor interpolation
- here for Qwen2 specifically
- and in the base class here
Others
Nearest neigbor downsampling incurs a large loss in image quality (small lines vanish, see the attached image). Qwen2VL generally uses BICUBIC downsampling instead to circumvent the this problem.

Generally, it would be great if one could directly configure with image_resolution and use the transformers preprocessor that is called after anyway.
Reminder
System Info
Reproduction
Others
Nearest neigbor downsampling incurs a large loss in image quality (small lines vanish, see the attached image). Qwen2VL generally uses BICUBIC downsampling instead to circumvent the this problem.
Generally, it would be great if one could directly configure with
image_resolutionand use thetransformerspreprocessor that is called after anyway.