Skip to content

fix: fix tokenizer parsing bug for guided decoding#4044

Merged
lvhan028 merged 1 commit intoInternLM:mainfrom
windreamer:fix_xgrammar_tokenizer
Oct 17, 2025
Merged

fix: fix tokenizer parsing bug for guided decoding#4044
lvhan028 merged 1 commit intoInternLM:mainfrom
windreamer:fix_xgrammar_tokenizer

Conversation

@windreamer
Copy link
Copy Markdown
Collaborator

close #4042

@windreamer windreamer requested a review from CUHKSZzxy October 16, 2025 10:49
@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

The current fix raises the following error

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 358, in __init__
    self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/guided_process.py", line 20, in __init__
    tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=vocab_size)
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/xgrammar/tokenizer_info.py", line 306, in from_huggingface
    raise ValueError(f"Unsupported tokenizer type: {type(tokenizer)}")
ValueError: Unsupported tokenizer type: <class 'transformers_modules.interns1-mini-remote.tokenization_interns1.InternS1Tokenizer'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 92, in _mp_proc
    engine = Engine.from_pretrained(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 458, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 374, in __init__
    self.executor = build_executor(model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/__init__.py", line 94, in build_executor
    return UniExecutor(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/uni_executor.py", line 39, in __init__
    self.model_agent = build_model_agent(model_path=model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 1202, in build_model_agent
    model_agent = BaseModelAgent(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 360, in __init__
    logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e.message}')
AttributeError: 'ValueError' object has no attribute 'message'

I try to remove message as follows, and it seems to work.

        try:
            self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
        except ValueError as e:
            logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e}')
            self.guided_decoding_manager = None

@windreamer windreamer force-pushed the fix_xgrammar_tokenizer branch from 9c86d54 to ad9ebc2 Compare October 17, 2025 03:36
@windreamer
Copy link
Copy Markdown
Collaborator Author

The current fix raises the following error

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 358, in __init__
    self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/guided_process.py", line 20, in __init__
    tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=vocab_size)
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/xgrammar/tokenizer_info.py", line 306, in from_huggingface
    raise ValueError(f"Unsupported tokenizer type: {type(tokenizer)}")
ValueError: Unsupported tokenizer type: <class 'transformers_modules.interns1-mini-remote.tokenization_interns1.InternS1Tokenizer'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 92, in _mp_proc
    engine = Engine.from_pretrained(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 458, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 374, in __init__
    self.executor = build_executor(model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/__init__.py", line 94, in build_executor
    return UniExecutor(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/uni_executor.py", line 39, in __init__
    self.model_agent = build_model_agent(model_path=model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 1202, in build_model_agent
    model_agent = BaseModelAgent(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 360, in __init__
    logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e.message}')
AttributeError: 'ValueError' object has no attribute 'message'

I try to remove message as follows, and it seems to work.

        try:
            self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
        except ValueError as e:
            logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e}')
            self.guided_decoding_manager = None

Thank you! Fixed.

@lvhan028 lvhan028 merged commit deacf91 into InternLM:main Oct 17, 2025
5 checks passed
@windreamer windreamer deleted the fix_xgrammar_tokenizer branch October 17, 2025 06:16
Skyseaee pushed a commit to Skyseaee/lmdeploy that referenced this pull request Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ValueError: Unsupported tokenizer type in xgrammer for interns1

3 participants