Skip to content

Support SDAR#3922

Merged
lvhan028 merged 37 commits intoInternLM:mainfrom
grimoire:support-SDAR
Sep 19, 2025
Merged

Support SDAR#3922
lvhan028 merged 37 commits intoInternLM:mainfrom
grimoire:support-SDAR

Conversation

@grimoire
Copy link
Copy Markdown
Collaborator

@grimoire grimoire commented Sep 2, 2025

Since we are refactoring chat template, we will use template of internlm2 for now.

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig, ChatTemplateConfig


from lmdeploy.pytorch.tools.utils import Timer, visualize_pipe_out


if __name__ == '__main__':
    model_path = 'JetLM/SDAR-1.7B-Chat'
    chat_template_config = ChatTemplateConfig('internlm2')

    log_level = 'WARNING'

    dllm_unmasking_strategy='low_confidence_dynamic'
    # dllm_unmasking_strategy='sequential'

    prompts = [
        'hakuna matata!',
        'The quick brown fox jumps over the lazy dog.'
        ]

    backend_config = PytorchEngineConfig(
        tp=1,
        dllm_block_length=4,
        dllm_unmasking_strategy=dllm_unmasking_strategy,
    )

    gen_config = GenerationConfig(
        max_new_tokens=512,
    )


    with pipeline(model_path, backend_config=backend_config,
                  chat_template_config=chat_template_config, log_level=log_level) as pipe:
        outputs = pipe(prompts, gen_config=gen_config)
        print(outputs)

@grimoire grimoire changed the title [POC]Support SDAR Support SDAR Sep 8, 2025
@grimoire grimoire marked this pull request as ready for review September 8, 2025 07:14
@lvhan028 lvhan028 self-requested a review September 9, 2025 07:21
@lvhan028 lvhan028 added the enhancement New feature or request label Sep 12, 2025

@logging_timer('SchedulePrefilling', logger)
def _schedule_prefill(self):
def _schedule_prefill(self, prealloc_size: int = 0):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of prealloc_size? I see it's always passed as 0 from the schedule method. Is this intended for future use or should it be removed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over allocate kv cache might be useful for future models.

@lvhan028
Copy link
Copy Markdown
Collaborator

Does SDAR conflict with --quant-policy fp8? I tried it, it repeated meaningless words.
FYI, --quant-policy for this model is not required in this PR

@lvhan028
Copy link
Copy Markdown
Collaborator

@zhulinJulia24 may put JetLM/SDAR-8B-Chat, JetLM/SDAR-30B-A3B-Chat into CI

Copy link
Copy Markdown
Collaborator

@RunningLeon RunningLeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 merged commit a96391b into InternLM:main Sep 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants