Skip to content

[Bug]: Can't run run_infinitebench.py #195

@ruili-pml

Description

@ruili-pml

Describe the bug

Hi,

Great work and thanks for sharing the code. I was trying to run run_infinitebench.py in experiment but then got this error

Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/pdb.py", line 1723, in main
    pdb._runscript(mainpyfile)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/pdb.py", line 1583, in _runscript
    self.run(statement)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/bdb.py", line 598, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/ubuntu/MInference/experiments/infinite_bench/run_infinitebench.py", line 274, in <module>
    pred = get_pred(
  File "/home/ubuntu/MInference/experiments/infinite_bench/run_infinitebench.py", line 100, in get_pred
    outputs = model.generate(**input_tensors, generation_config=generation_config)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 2564, in generate
    result = decoding_method(
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 2756, in _sample
    model_kwargs = self._get_initial_cache_position(cur_len, input_ids.device, model_kwargs)
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/transformers/generation/utils.py", line 1833, in _get_initial_cache_position
    past_length = cache.get_seq_length()
  File "/home/ubuntu/miniconda3/envs/minference/lib/python3.10/site-packages/minference/modules/kvcompression.py", line 439, in get_seq_length
    if len(self.key_cache) <= layer_idx:
AttributeError: 'DynamicCacheWithRepeat' object has no attribute 'key_cache'

If it helps, I'm using transformer 4.57.1 and vllm 0.11.0.

In addition, is it necessary to install vllm flash attn?

from vllm_flash_attn import flash_attn_varlen_func, flash_attn_with_kvcache

It only works with a quite old torch and vllm.

Would be great if you can take a look.

Thanks,
Rui

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions