Skip to content

feat: add e2e test to verify service is avaliable#310

Merged
InftyAI-Agent merged 4 commits into
InftyAI:mainfrom
nayihz:feat_e2e
Mar 17, 2025
Merged

feat: add e2e test to verify service is avaliable#310
InftyAI-Agent merged 4 commits into
InftyAI:mainfrom
nayihz:feat_e2e

Conversation

@nayihz
Copy link
Copy Markdown
Contributor

@nayihz nayihz commented Mar 12, 2025

What this PR does / why we need it

Which issue(s) this PR fixes

Fixes ##274

Special notes for your reviewer

Does this PR introduce a user-facing change?


@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Mar 12, 2025
@InftyAI-Agent InftyAI-Agent requested a review from kerthcet March 12, 2025 12:15
@nayihz
Copy link
Copy Markdown
Contributor Author

nayihz commented Mar 12, 2025

/kind test
/kind feature

Comment thread test/util/validation/validate_service.go Outdated
@nayihz
Copy link
Copy Markdown
Contributor Author

nayihz commented Mar 12, 2025

Error: test/util/validation/validate_service.go:346:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/util/validation/validate_service.go:[37](https://github.com/InftyAI/llmaz/actions/runs/13811392705/job/38633611131?pr=310#step:4:39)7:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^

Something wrong with golang ci lint?

@nayihz
Copy link
Copy Markdown
Contributor Author

nayihz commented Mar 12, 2025

/kind feature

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Mar 12, 2025
@kerthcet
Copy link
Copy Markdown
Member

Error: test/util/validation/validate_service.go:346:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^
  Error: test/util/validation/validate_service.go:[37](https://github.com/InftyAI/llmaz/actions/runs/13811392705/job/38633611131?pr=310#step:4:39)7:23: Error return value of `resp.Body.Close` is not checked (errcheck)
  	defer resp.Body.Close()
  	                     ^

Something wrong with golang ci lint?

This is because resp.Body.Close() will return a value, but we don't validate the return. We can silent the lint I think. Or we have to do something like:

defer func() {
   _ = resp.Body.Close()
}()

return nil
}

func CheckLlamacppServeAvaliable(localPort int) error {
Copy link
Copy Markdown
Member

@kerthcet kerthcet Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we have two different Check here? I think they're both OpenAI compatible, only one client makes more sense I think. Maybe we can use the https://github.com/openai/openai-go? The benefit is it supports more rich features like constructing the system prompt and chat conversation, or we have to build the structure ourself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's support OpenAI API compatible chat completions first. It's a standard protocol across most of the inference engines.

@nayihz
Copy link
Copy Markdown
Contributor Author

nayihz commented Mar 13, 2025

This is because resp.Body.Close() will return a value, but we don't validate the return. We can silent the lint I think.

I see the same code in other project like kubernetes/lws, but no lint error in CI.

@nayihz nayihz changed the title {WIP}feat: add e2e test to verify service is avaliable feat: add e2e test to verify service is avaliable Mar 13, 2025
Copy link
Copy Markdown
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Thanks @nayihz

Comment thread test/util/validation/validate_service.go Outdated
Comment thread test/util/validation/validate_service.go Outdated
Comment thread test/util/validation/validate_service.go Outdated
Comment thread test/util/validation/validate_service.go Outdated
Comment thread test/e2e/playground_test.go
@nayihz nayihz force-pushed the feat_e2e branch 4 times, most recently from 7c31097 to c972cba Compare March 14, 2025 08:23
@nayihz
Copy link
Copy Markdown
Contributor Author

nayihz commented Mar 14, 2025

All comments addressed. @kerthcet
e2e-test took a very long time.

playground e2e tests Deploy a huggingface model with customized backendRuntime
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:96
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080
• [1075.442 seconds]

Will it be rate-limited by huggingFace if we run ci many times? I suspect that most of the time is spent on pulling the model.

Copy link
Copy Markdown
Member

@kerthcet kerthcet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one nit.

Comment thread test/util/validation/validate_service.go Outdated
@kerthcet
Copy link
Copy Markdown
Member

All comments addressed. @kerthcet e2e-test took a very long time.

playground e2e tests Deploy a huggingface model with customized backendRuntime
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:96
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080
• [1075.442 seconds]

Will it be rate-limited by huggingFace if we run ci many times? I suspect that most of the time is spent on pulling the model.

I think it's unrelated to this PR, right? We just check the service ready.

@kerthcet
Copy link
Copy Markdown
Member

em.. seems we used to run e2e tests with minutes. the last record.

succeeded  in 5m 55s

@kerthcet
Copy link
Copy Markdown
Member

/retest all

@kerthcet
Copy link
Copy Markdown
Member

Once model is downloaded, loading the model into memory still takes time.

@kerthcet
Copy link
Copy Markdown
Member

I rerun the tests, it finished in 6mins. I think the time is acceptable.

succeeded  in 6m 11s

@kerthcet
Copy link
Copy Markdown
Member

rerun again to see the final result.

@kerthcet
Copy link
Copy Markdown
Member

Seems stuck here, model is already downloaded.

playground e2e tests Deploy a huggingface model with llama.cpp
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:77
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080

@kerthcet
Copy link
Copy Markdown
Member

Have no idea whether is because of the portforward or the resource contention. May take a look later.

playground e2e tests Deploy a huggingface model with llama.cpp
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:77
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080
• [985.885 seconds]
------------------------------
playground e2e tests Deploy a huggingface model with customized backendRuntime
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:96
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080
• [964.528 seconds]
------------------------------
playground e2e tests Deploy a huggingface model with llama.cpp, HPA enabled
/home/runner/work/llmaz/llmaz/test/e2e/playground_test.go:122
• [16.392 seconds]

}()
<-readyChan
return check()
}).Should(gomega.Succeed())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not use eventually here, I don't think we want to forward the port for several time, eventually is used for status check, so we can wrap the check() function with eventually. Generally it looks like:

func ValidateServiceAvaliable() {
  // port forward logic
  select {
  case <-readyChan:
  case <-time.After(TIMEOUT):
    return fmt.Errorf("port forwarding timeout")
  }

  Eventually ({check()}, TIMEOUT, INTERVAL)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. PTAL

@kerthcet
Copy link
Copy Markdown
Member

/lgtm
/approve

Let's make it happen now, we can focus on the performance later! Thanks @nayihz

@InftyAI-Agent InftyAI-Agent added lgtm Looks good to me, indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 17, 2025
@InftyAI-Agent InftyAI-Agent merged commit 5adc074 into InftyAI:main Mar 17, 2025
@nayihz nayihz deleted the feat_e2e branch March 17, 2025 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. feature Categorizes issue or PR as related to a new feature. lgtm Looks good to me, indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants