[CI/Build] Update CPU tests to include all "standard" tests#5481
[CI/Build] Update CPU tests to include all "standard" tests#5481Isotr0py merged 29 commits intovllm-project:mainfrom DarkLight1337:test-llava-cpu
Conversation
|
@DarkLight1337 I noticed that when running the I'm afraid that this will significantly slow down the test. What do you think about this? |
Is this specific to LLaVA model or does this also occur for the other models? If it's the latter case then I think this change would have a relatively small impact compared to the baseline. |
|
Hmm, this may be because the vision tower and multi-modal projector have not been optimized for vLLM yet. Let's wait for it to be implemented as described in #4194. |
|
@DarkLight1337 @Isotr0py hi, thanks for looking on this, Initially when enabling on the CPU CI, I find there are some issues on Llava CPU backend, so I disabled that part firstly |
Hmm... maybe |
|
@DarkLight1337 Hi, just did a quick check locally, with the latest code, the llava test will fail due to some result mismatch due to float16 vs bfloat16. below diff can help to fix this issue. |
|
According to the CI log, it currently takes 5-10 seconds for each LLaVA-1.5 iteration and 20-40 seconds for each LLaVA-NeXT iteration. This is much longer than the other models which take less than 2 seconds (you can verify this by searching for the s/it string outputted by tqdm). |
|
#5591 has been merged. Let's see the performance now... |
|
Getting this error: Does the CPU test not recompile vLLM? @WoosukKwon |
Hmm... Edit: I see |
|
Hi Roger,
The patch in #5591 is adding CUDA kernel only - should be OK to add the CPU related kernel under: csrc/cpu
We could also help to do this if required.
CC @bigPYJ1151
Thanks, -yuan
…________________________________
From: Roger Wang ***@***.***>
Sent: Friday, June 21, 2024 12:29 AM
To: vllm-project/vllm ***@***.***>
Cc: Yuan ***@***.***>; Comment ***@***.***>
Subject: Re: [vllm-project/vllm] [CI/Build] Enable LLaVA CPU test (PR #5481)
Getting this error:
Error in calling custom op gelu_quick: '_OpNamespace' '_C' object has no attribute 'gelu_quick'
Does the CPU test not recompile vLLM? @WoosukKwon<https://github.com/WoosukKwon>
Hmm...gelu_quick was actually added in #5591<#5591> as well though I'm not sure how to add that to be compatible with CPU
—
Reply to this email directly, view it on GitHub<#5481 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAAKXDOUUS4GTDUREPZS3RTZIL7PFAVCNFSM6AAAAABJHMQPFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGA4TQMZUGA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
@zhouyuan yea - I already made a PR #5717 and it's just waiting for review now |
|
There is no observable speed increase so far. Perhaps the multi-modal projector also has to be optimized? |
Hmm... yea - the other place to optimize is |
|
|
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
The test duration has gone up from 18 minutes to 30 minutes. Given we currently merge 10-20 PRs per day, if we assume that CI AWS is triggered 3x per commit (the minimum is 2x - once pre-merge and once post-merge, but it's unlikely that the CI passes on the first try after @bigPYJ1151 do you know whether it's possible to increase the number of agents to 2? Otherwise, I'll prune some tests from this PR. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
@DarkLight1337 The Looks like some audio language model tests require chunked-prefill, will open a PR for it recently. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
After removing the tests for unsupported models (involving embedding and chunked prefill), the test duration is down to 26 minutes, which should be OK for now. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
I have added |
|
@Isotr0py PTAL and see if this looks ok to you as well. |
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>




This change should help catch issues related to VLMs that are specific to CPU (e.g. #5451, #7735, #8061).
Edit: Updated the list of related issues in light of recent PRs.