Skip to content

[Web] Seperate parallel shard download and iterative shard loading#16650

Merged
tqchen merged 8 commits intoapache:mainfrom
DiegoCao:indexdb2
Mar 14, 2024
Merged

[Web] Seperate parallel shard download and iterative shard loading#16650
tqchen merged 8 commits intoapache:mainfrom
DiegoCao:indexdb2

Conversation

@DiegoCao
Copy link
Copy Markdown
Contributor

@DiegoCao DiegoCao commented Feb 27, 2024

This PR address the issue in mlc-ai/web-llm#313. We make the following changes:

  • Seperate downloading shards and loading shards to ndarraycache, where the former is done with parallel downloads, and the latter is purely sequential
  • We limit the maximum concurrent downloads to 4 by launching 4 parallel for loops
  • We add try-catch when loading shards onto ndarraycache

Separately, we add and export IndexedDB initial implementation.

@DavidGOrtega
Copy link
Copy Markdown
Contributor

@DiegoCao Its not the real solution.
As I stated the cache can fail even if its just one by one. I have suffered that also.

Definitely HF CDN is not great. It should allow to do parallel requests without mayor problems.

Real Solution:

  1. Implement retry mechanism (like 3 retries or something)
  2. Parallellse n files (5 by default) instead of all of them

@DavidGOrtega
Copy link
Copy Markdown
Contributor

I can give it a shot tomorrow

@DiegoCao DiegoCao marked this pull request as draft March 1, 2024 17:48
@DiegoCao DiegoCao marked this pull request as ready for review March 1, 2024 17:48
@CharlieFRuan
Copy link
Copy Markdown
Member

CharlieFRuan commented Mar 2, 2024

@DavidGOrtega Thanks for offering help! We found out that the issue was probably not due to parallel downloading the shards but due to parallelly processing the shards. We managed to keep the parallel downloads. If download issues persist, we'll add parallel download in batch as you suggested.

CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Mar 2, 2024
The new version include 2 changes:
- Include cache deletion API via
#314
- Fix model download/caching issue on TVMjs side via
apache/tvm#16650
@tqchen
Copy link
Copy Markdown
Member

tqchen commented Mar 12, 2024

need rebase

CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Mar 12, 2024
Another minor follow-up to version 0.2.24 (or hence to 0.2.25). This PR
adds a `try-catch` when loading the **_already-downloaded_** weights,
attempting to provide more information to the `exit(1)` error in
#322.

The only change is TVMJS's commit
apache/tvm@b193cbb
from apache/tvm#16650
@DiegoCao DiegoCao changed the title [Web] Revert back to the non-parallel version to avoid cache.add() error [Web] Seperate parallel shard download and iterative shard loading Mar 12, 2024
@tqchen tqchen merged commit 939b8b9 into apache:main Mar 14, 2024
CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Mar 14, 2024
Changes in WebLLM:
- Stateful chat completion: #330
- OpenAI's `logit_bias`: #331
- OpenAI's `logprobs` and `top_logprobs`:
#333

Changes in TVMjs:
- apache/tvm#16650
- Fix param download issues (already reflected in 0.2.26, but at the
time this PR was not merged yet)
  - Expose `sampleTopPFromProb` to support `logprobs` (new in 0.2.27)
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
…pache#16650)

* Fix Parallel Download Issue by seperating the downloading with serialization process

Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>

* Fix callback disply

* [Web] Support IndexDB Caching

* Limit max concurrent download to 4 shards

* Try to catch error when loading model to ndarray cache


---------

Co-authored-by: Charlie Ruan <53290280+CharlieFRuan@users.noreply.github.com>
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
The new version include 2 changes:
- Include cache deletion API via
mlc-ai#314
- Fix model download/caching issue on TVMjs side via
apache/tvm#16650
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
Another minor follow-up to version 0.2.24 (or hence to 0.2.25). This PR
adds a `try-catch` when loading the **_already-downloaded_** weights,
attempting to provide more information to the `exit(1)` error in
mlc-ai#322.

The only change is TVMJS's commit
apache/tvm@b193cbb
from apache/tvm#16650
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
Changes in WebLLM:
- Stateful chat completion: mlc-ai#330
- OpenAI's `logit_bias`: mlc-ai#331
- OpenAI's `logprobs` and `top_logprobs`:
mlc-ai#333

Changes in TVMjs:
- apache/tvm#16650
- Fix param download issues (already reflected in 0.2.26, but at the
time this PR was not merged yet)
  - Expose `sampleTopPFromProb` to support `logprobs` (new in 0.2.27)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants