Qwen2VL using nearest neighbour downsampling by default

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

```
- `llamafactory` version: 0.9.2.dev0
- Platform: Linux-4.15.0-213-generic-x86_64-with-glibc2.27
- Python version: 3.11.9
- PyTorch version: 2.5.1
- Transformers version: 4.49.0
- Datasets version: 3.2.0
- Accelerate version: 0.34.0
- PEFT version: 0.12.0
- TRL version: 0.9.6
- DeepSpeed version: 0.15.4
- vLLM version: 0.7.3.dev0+g0408efc6.d20250208
```

### Reproduction

```text
- Train Qwen2VL model using any config and a dataset that contains images `> image_resolution`
- Images are resized using nearest neighbor interpolation  
```
- [here](https://github.com/hiyouga/LLaMA-Factory/blob/1036311826a61fed2346a261c8a060c355778318/src/llamafactory/data/mm_plugin.py#L1044-L1054) for Qwen2 specifically
- and in the base class [here](https://github.com/hiyouga/LLaMA-Factory/blob/1036311826a61fed2346a261c8a060c355778318/src/llamafactory/data/mm_plugin.py#L125-L133)

### Others

Nearest neigbor downsampling incurs a large loss in image quality (small lines vanish, see the attached image). [Qwen2VL](https://github.com/QwenLM/Qwen2.5-VL/blob/3c4f939dda856110cefb315df6280b9b4eb15219/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L138) generally uses BICUBIC downsampling instead to circumvent the this problem.


![Image](https://github.com/user-attachments/assets/d0b6648f-dce7-4e32-ad72-454c73224159)


Generally, it would be great if one could directly configure with `image_resolution` and use the `transformers` preprocessor that is [called after anyway.](https://github.com/hiyouga/LLaMA-Factory/blob/1036311826a61fed2346a261c8a060c355778318/src/llamafactory/data/mm_plugin.py#L243)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2VL using nearest neighbour downsampling by default #7125

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Qwen2VL using nearest neighbour downsampling by default #7125

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions