fix: fix tokenizer parsing bug for guided decoding by windreamer · Pull Request #4044 · InternLM/lmdeploy

windreamer · 2025-10-16T10:49:31Z

CUHKSZzxy · 2025-10-17T02:58:56Z

The current fix raises the following error

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 358, in __init__
    self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/guided_process.py", line 20, in __init__
    tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=vocab_size)
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/xgrammar/tokenizer_info.py", line 306, in from_huggingface
    raise ValueError(f"Unsupported tokenizer type: {type(tokenizer)}")
ValueError: Unsupported tokenizer type: <class 'transformers_modules.interns1-mini-remote.tokenization_interns1.InternS1Tokenizer'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 92, in _mp_proc
    engine = Engine.from_pretrained(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 458, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 374, in __init__
    self.executor = build_executor(model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/__init__.py", line 94, in build_executor
    return UniExecutor(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/uni_executor.py", line 39, in __init__
    self.model_agent = build_model_agent(model_path=model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 1202, in build_model_agent
    model_agent = BaseModelAgent(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 360, in __init__
    logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e.message}')
AttributeError: 'ValueError' object has no attribute 'message'

I try to remove message as follows, and it seems to work.

        try:
            self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
        except ValueError as e:
            logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e}')
            self.guided_decoding_manager = None

windreamer · 2025-10-17T03:36:47Z

The current fix raises the following error

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 358, in __init__
    self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/guided_process.py", line 20, in __init__
    tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=vocab_size)
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/xgrammar/tokenizer_info.py", line 306, in from_huggingface
    raise ValueError(f"Unsupported tokenizer type: {type(tokenizer)}")
ValueError: Unsupported tokenizer type: <class 'transformers_modules.interns1-mini-remote.tokenization_interns1.InternS1Tokenizer'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/nvme1/zhouxinyu/miniconda3/envs/lmdeploy/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/mp_engine/zmq_engine.py", line 92, in _mp_proc
    engine = Engine.from_pretrained(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 458, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/engine.py", line 374, in __init__
    self.executor = build_executor(model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/__init__.py", line 94, in build_executor
    return UniExecutor(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/executor/uni_executor.py", line 39, in __init__
    self.model_agent = build_model_agent(model_path=model_path,
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 1202, in build_model_agent
    model_agent = BaseModelAgent(
  File "/nvme1/zhouxinyu/lmdeploy_dvd/lmdeploy/pytorch/engine/model_agent.py", line 360, in __init__
    logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e.message}')
AttributeError: 'ValueError' object has no attribute 'message'

I try to remove message as follows, and it seems to work.

        try:
            self.guided_decoding_manager = GuidedDecodingMangager(self.tokenizer, self.sampling_vocab_size)
        except ValueError as e:
            logger.warning(f'Failed to create GuidedManager for tokenizer {self.tokenizer}: {e}')
            self.guided_decoding_manager = None

Thank you! Fixed.

windreamer requested a review from CUHKSZzxy October 16, 2025 10:49

CUHKSZzxy approved these changes Oct 16, 2025

View reviewed changes

fix: fix tokenizer parsing bug for guided decoding

ad9ebc2

windreamer force-pushed the fix_xgrammar_tokenizer branch from 9c86d54 to ad9ebc2 Compare October 17, 2025 03:36

lvhan028 merged commit deacf91 into InternLM:main Oct 17, 2025
5 checks passed

lvhan028 added the Bug:P1 label Oct 17, 2025

windreamer deleted the fix_xgrammar_tokenizer branch October 17, 2025 06:16

Skyseaee pushed a commit to Skyseaee/lmdeploy that referenced this pull request Jan 4, 2026

fix: fix tokenizer parsing bug for guided decoding (InternLM#4044)

24b1dd0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix tokenizer parsing bug for guided decoding#4044

fix: fix tokenizer parsing bug for guided decoding#4044
lvhan028 merged 1 commit intoInternLM:mainfrom
windreamer:fix_xgrammar_tokenizer

windreamer commented Oct 16, 2025

Uh oh!

CUHKSZzxy commented Oct 17, 2025

Uh oh!

windreamer commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

windreamer commented Oct 16, 2025

Uh oh!

CUHKSZzxy commented Oct 17, 2025

Uh oh!

windreamer commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants