Add kernelize to transformers#38205
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
gante
left a comment
There was a problem hiding this comment.
Sounds like a good plan to me 👍
(plz wait for Arthur's feedback before merging :P)
| if torch.cuda.is_available(): | ||
| kernelize(model, device=Device(type="cuda")) | ||
| # only cuda supported for now | ||
| else: | ||
| kernelize(model, device=Device(type="cpu")) |
There was a problem hiding this comment.
why not pass model.device? or device_map as it hold for each layer?
There was a problem hiding this comment.
We can't use device_map, because when it's set to "auto", it only contains the indexes of the accelerators used. This means we would have to rely on torch.cuda.is_available() to check if CUDA is available.
But indeed we can simply use model.device to get the type of device being used (e.g., "cuda" or "cpu").
| if past_key_values is not None and hasattr(past_key_values, "is_sliding"): | ||
| for i, is_sliding in enumerate(past_key_values.is_sliding): | ||
| if not is_sliding: | ||
| layer_idx = i | ||
| break |
There was a problem hiding this comment.
this is unrelated should be reverted
There was a problem hiding this comment.
nope it isn't ! using index was not compiling
There was a problem hiding this comment.
ok can you try casting is_sliding to a torch.tensor?
|
Update kernel pin as well! |
|
Thanks 🫡 |
What does this PR do?
Instead of dynamically switching the
forwardmethods using a decorator, we are exploring a new approach that performs this replacement statically withinmodeling_utils.py. This allows us to modify theforwardmethods at load time, which makes the kernels compile compatible.Also there is no need to check if torch is compiling or not since use_kernels is False by default, and in kernelize we only switch forwards if the kernel is compatible with compile.
This pr should be merged after : huggingface/kernels#87