[CI/Build] Update CPU tests to include all "standard" tests by DarkLight1337 · Pull Request #5481 · vllm-project/vllm

DarkLight1337 · 2024-06-13T01:18:56Z

This change should help catch issues related to VLMs that are specific to CPU (e.g. #5451, #7735, #8061).

Edit: Updated the list of related issues in light of recent PRs.

Isotr0py · 2024-06-13T07:37:51Z

@DarkLight1337 I noticed that when running the test_llava.py with cpu environment, the hf_runner is locked to use only one thread for model forward (while vllm_runner doesn't have this issue):

I'm afraid that this will significantly slow down the test. What do you think about this?

DarkLight1337 · 2024-06-13T07:43:22Z

@DarkLight1337 I noticed that when running the test_llava.py with cpu environment, the hf_runner is locked to use only one thread for model forward (while vllm_runner doesn't have this issue):

I'm afraid that this will significantly slow down the test. What do you think about this?

Is this specific to LLaVA model or does this also occur for the other models? If it's the latter case then I think this change would have a relatively small impact compared to the baseline.

Isotr0py · 2024-06-13T07:53:30Z

It looks like that this is specific to vision model, which occurred in both llava and phi3v test.

I also test test_models.py with phi-2, and threads are handled normally:

DarkLight1337 · 2024-06-13T07:58:30Z

Hmm, this may be because the vision tower and multi-modal projector have not been optimized for vLLM yet. Let's wait for it to be implemented as described in #4194.

zhouyuan · 2024-06-13T09:06:23Z

@DarkLight1337 @Isotr0py hi, thanks for looking on this, Initially when enabling on the CPU CI, I find there are some issues on Llava CPU backend, so I disabled that part firstly

DarkLight1337 · 2024-06-13T10:03:42Z

@DarkLight1337 @Isotr0py hi, thanks for looking on this, Initially when enabling on the CPU CI, I find there are some issues on Llava CPU backend, so I disabled that part firstly

Hmm... maybe CPUModelRunner.device somehow does not match the device of some model parameters.

zhouyuan · 2024-06-13T13:17:39Z

@DarkLight1337 Hi, just did a quick check locally, with the latest code, the llava test will fail due to some result mismatch due to float16 vs bfloat16. below diff can help to fix this issue.

diff --git a/tests/models/test_llava.py b/tests/models/test_llava.py
index cc0685ca..57a92a7a 100644
--- a/tests/models/test_llava.py
+++ b/tests/models/test_llava.py
@@ -76,9 +76,14 @@ def sanitize_vllm_output(vllm_output: Tuple[List[int], str],
     return sanitized_input_ids, sanitzied_output_str


+#TODO: remove this after CPU float16 support ready
+target_dtype = "float"
+if torch.cuda.is_available():
+    target_dtype = "half"
+
 @pytest.mark.parametrize("worker_use_ray", [False])
 @pytest.mark.parametrize("model_and_config", model_and_vl_config)
-@pytest.mark.parametrize("dtype", ["half"])
+@pytest.mark.parametrize("dtype", [target_dtype])
 @pytest.mark.parametrize("max_tokens", [128])
 def test_models(hf_runner, vllm_runner, hf_image_prompts, hf_images,
                 vllm_image_prompts, vllm_images, model_and_config, dtype: str,

DarkLight1337 · 2024-06-14T09:11:09Z

According to the CI log, it currently takes 5-10 seconds for each LLaVA-1.5 iteration and 20-40 seconds for each LLaVA-NeXT iteration. This is much longer than the other models which take less than 2 seconds (you can verify this by searching for the s/it string outputted by tqdm).

tests/models/test_llava.py

DarkLight1337 · 2024-06-20T11:53:58Z

#5591 has been merged. Let's see the performance now...

DarkLight1337 · 2024-06-20T12:28:53Z

Getting this error:

Error in calling custom op gelu_quick: '_OpNamespace' '_C' object has no attribute 'gelu_quick'

Does the CPU test not recompile vLLM? @WoosukKwon

ywang96 · 2024-06-20T16:29:17Z

Getting this error:
Error in calling custom op gelu_quick: '_OpNamespace' '_C' object has no attribute 'gelu_quick'
Does the CPU test not recompile vLLM? @WoosukKwon

Hmm...gelu_quick was actually added in #5591 as well ~~though I'm not sure how to add that to be compatible with CPU~~

Edit: I see activation.cpp under cpu directory now, will add to it.

zhouyuan · 2024-06-21T00:06:46Z

Hi Roger, The patch in #5591 is adding CUDA kernel only - should be OK to add the CPU related kernel under: csrc/cpu We could also help to do this if required. CC @bigPYJ1151 Thanks, -yuan

…

________________________________ From: Roger Wang ***@***.***> Sent: Friday, June 21, 2024 12:29 AM To: vllm-project/vllm ***@***.***> Cc: Yuan ***@***.***>; Comment ***@***.***> Subject: Re: [vllm-project/vllm] [CI/Build] Enable LLaVA CPU test (PR #5481) Getting this error: Error in calling custom op gelu_quick: '_OpNamespace' '_C' object has no attribute 'gelu_quick' Does the CPU test not recompile vLLM? @WoosukKwon<https://github.com/WoosukKwon> Hmm...gelu_quick was actually added in #5591<#5591> as well though I'm not sure how to add that to be compatible with CPU — Reply to this email directly, view it on GitHub<#5481 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAAKXDOUUS4GTDUREPZS3RTZIL7PFAVCNFSM6AAAAABJHMQPFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGA4TQMZUGA>. You are receiving this because you commented.Message ID: ***@***.***>

ywang96 · 2024-06-21T00:37:42Z

Hi Roger, The patch in #5591 is adding CUDA kernel only - should be OK to add the CPU related kernel under: csrc/cpuWe could also help to do this if required. CC @bigPYJ1151 Thanks, -yuan
…
________________________________ From: Roger Wang @.> Sent: Friday, June 21, 2024 12:29 AM To: vllm-project/vllm @.> Cc: Yuan @.>; Comment @.> Subject: Re: [vllm-project/vllm] [CI/Build] Enable LLaVA CPU test (PR #5481) Getting this error: Error in calling custom op gelu_quick: '_OpNamespace' '_C' object has no attribute 'gelu_quick' Does the CPU test not recompile vLLM? @WoosukKwon https://github.com/WoosukKwon Hmm...gelu_quick was actually added in #5591<#5591> as well though I'm not sure how to add that to be compatible with CPU — Reply to this email directly, view it on GitHub<#5481 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAAKXDOUUS4GTDUREPZS3RTZIL7PFAVCNFSM6AAAAABJHMQPFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGA4TQMZUGA. You are receiving this because you commented.Message ID: @.***>

@zhouyuan yea - I already made a PR #5717 and it's just waiting for review now

DarkLight1337 · 2024-06-21T08:55:45Z

There is no observable speed increase so far. Perhaps the multi-modal projector also has to be optimized?

ywang96 · 2024-06-21T15:25:06Z

There is no observable speed increase so far. Perhaps the multi-modal projector also has to be optimized?

Hmm... yea - the other place to optimize is CLIPAttention itself. Right now it's still imported from transformers.

DarkLight1337 · 2024-11-07T06:41:25Z

~~Waiting for #10108~~

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2024-11-07T13:41:03Z

The test duration has gone up from 18 minutes to 30 minutes. Given we currently merge 10-20 PRs per day, if we assume that CI AWS is triggered 3x per commit (the minimum is 2x - once pre-merge and once post-merge, but it's unlikely that the CI passes on the first try after ready label is added), we may accumulate a backlog if the CPU tests are only backed by a single agent.

@bigPYJ1151 do you know whether it's possible to increase the number of agents to 2? Otherwise, I'll prune some tests from this PR.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

bigPYJ1151 · 2024-11-08T05:14:32Z

@DarkLight1337 The hf_runner is too slow on CPU. I suggest to only use some small, typical VLM tests (e.g., qwen2_vl, mark it with core_model_cpu). The main purpose is to verify the VLM support in cpu_model_runner and TorchSDPA, so some typical cases are enough.

Looks like some audio language model tests require chunked-prefill, will open a PR for it recently.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2024-11-08T11:43:07Z

After removing the tests for unsupported models (involving embedding and chunked prefill), the test duration is down to 26 minutes, which should be OK for now.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 · 2024-11-08T14:45:39Z

@DarkLight1337 The hf_runner is too slow on CPU. I suggest to only use some small, typical VLM tests (e.g., qwen2_vl, mark it with core_model_cpu). The main purpose is to verify the VLM support in cpu_model_runner and TorchSDPA, so some typical cases are enough.

Looks like some audio language model tests require chunked-prefill, will open a PR for it recently.

I have added cpu_model tag to further trim down the tests. The latest run now only takes 21 minutes.

DarkLight1337 · 2024-11-08T15:13:23Z

@Isotr0py PTAL and see if this looks ok to you as well.

Isotr0py

LGTM!

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com>

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

Enable LLaVA test in CPU

597cb35

DarkLight1337 requested a review from simon-mo June 13, 2024 01:20

WoosukKwon added the x86 CPU label Jun 13, 2024

DarkLight1337 marked this pull request as draft June 13, 2024 07:58

Fix failing test on CPU due to unsupported dtype

845b465

DarkLight1337 mentioned this pull request Jun 14, 2024

[CI/Build] Disable LLaVA-NeXT CPU test #5529

Merged

DarkLight1337 changed the title ~~[CI/Build] Enable LLaVA test in CPU~~ [CI/Build] Enable LLaVA CPU test Jun 14, 2024

Merge branch 'upstream' into test-llava-cpu

5f92d96

Isotr0py reviewed Jun 15, 2024

View reviewed changes

tests/models/test_llava.py Outdated Show resolved Hide resolved

DarkLight1337 mentioned this pull request Jun 17, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

57 tasks

DarkLight1337 added 2 commits June 19, 2024 03:14

Merge branch 'upstream' into test-llava-cpu

789b493

Merge branch 'upstream' into test-llava-cpu

e50b808

Merge branch 'upstream' into test-llava-cpu

8ba6e77

DarkLight1337 changed the title ~~[CI/Build] Enable LLaVA CPU test~~ [CI/Build] Enable CPU test for VLMs Jun 21, 2024

Install torchvision

783cb76

DarkLight1337 added 3 commits November 7, 2024 10:43

Merge branch 'upstream' into test-llava-cpu

7bc3ad1

Fix missing library

e41db03

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix loading image embeds on CPU

8e3cf44

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added 4 commits November 7, 2024 17:12

Fix errors not being propagated to CI

cd1cd15

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix missing libraries

b401cb9

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Embedding models are not supported for CPU backend

431a5c8

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Merge branch 'upstream' into test-llava-cpu

0df552f

DarkLight1337 added 2 commits November 8, 2024 06:14

Chunked prefill not supported for CPU

8c817e4

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix installation

4c39939

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Add cpu_model mark

9ef98fa

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py approved these changes Nov 8, 2024

View reviewed changes

Isotr0py merged commit b489fc3 into vllm-project:main Nov 8, 2024

DarkLight1337 deleted the test-llava-cpu branch November 8, 2024 15:31

DarkLight1337 mentioned this pull request Nov 8, 2024

[CI/Build] Adding timeout in CPU CI to avoid CPU test queue blocking #6892

Merged

Isotr0py pushed a commit to Isotr0py/vllm that referenced this pull request Nov 8, 2024

[CI/Build] Update CPU tests to include all "standard" tests (vllm-pro…

167b9f7

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com>

JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024

[CI/Build] Update CPU tests to include all "standard" tests (vllm-pro…

32e4940

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>

jeejeelee pushed a commit to jeejeelee/vllm that referenced this pull request Nov 11, 2024

[CI/Build] Update CPU tests to include all "standard" tests (vllm-pro…

22495c7

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[CI/Build] Update CPU tests to include all "standard" tests (vllm-pro…

4170677

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[CI/Build] Update CPU tests to include all "standard" tests (vllm-pro…

7d5a024

…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Uh oh!

Conversation

DarkLight1337 commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Isotr0py commented Jun 13, 2024

Uh oh!

DarkLight1337 commented Jun 13, 2024

Uh oh!

Isotr0py commented Jun 13, 2024

Uh oh!

DarkLight1337 commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhouyuan commented Jun 13, 2024

Uh oh!

DarkLight1337 commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhouyuan commented Jun 13, 2024

Uh oh!

DarkLight1337 commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Jun 20, 2024

Uh oh!

DarkLight1337 commented Jun 20, 2024

Uh oh!

ywang96 commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhouyuan commented Jun 21, 2024 via email

Uh oh!

ywang96 commented Jun 21, 2024

Uh oh!

DarkLight1337 commented Jun 21, 2024

Uh oh!

ywang96 commented Jun 21, 2024

Uh oh!

DarkLight1337 commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bigPYJ1151 commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Nov 8, 2024

Uh oh!

DarkLight1337 commented Nov 8, 2024

Uh oh!

DarkLight1337 commented Nov 8, 2024

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DarkLight1337 commented Jun 13, 2024 •

edited

Loading

DarkLight1337 commented Jun 13, 2024 •

edited

Loading

DarkLight1337 commented Jun 13, 2024 •

edited

Loading

DarkLight1337 commented Jun 14, 2024 •

edited

Loading

ywang96 commented Jun 20, 2024 •

edited

Loading

DarkLight1337 commented Nov 7, 2024 •

edited

Loading

DarkLight1337 commented Nov 7, 2024 •

edited

Loading

bigPYJ1151 commented Nov 8, 2024 •

edited

Loading