-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Technically, I'm just grabbing the .base_model.model directly, rather than using get_base_model(), but that should have the same effect, since that's all get_base_model() does if the active_peft_config is not PromptLearningConfig as seen here.
After loading a llama model with a LoRA, like so:
shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_names[0]}"), **params)The PeftModel loads fine and everything is working as expected. However, I can not figure out how to get the original model back without a LoRA still being active when I do an inference.
The code I'm using is from here:
shared.model.disable_adapter()
shared.model = shared.model.base_model.modelThis gives me the model back as a LlamaForCausalLM, but when I go to inference, the LoRA is still applied. I made a couple of test LoRAs so that there would be no question as to whether the LoRA is still loaded. They can be found here: https://huggingface.co/clayshoaf/AB-Lora-Test
I am digging around right now, and I see this line: if isinstance(module, LoraLayer): from:
def _set_adapter_layers(self, enabled=True):
for module in self.model.modules():
if isinstance(module, LoraLayer):
module.disable_adapters = False if enabled else TrueSo I checked in the program and if I load a LoRA and do
[module for module in shared.model.base_model.model.modules() if hasattr(module, "disable_adapters")]it returns a bunch of modules that are of the type Linear8bitLt (if loaded in 8bit) or Linear4bitLt (if loaded in 4bit).
Would it work to set the modules' disable_adapters value to false? I don't want to hack around too much in the code, because I don't have a deep enough understanding to be sure that I won't mess something else up in the process.
If that won't work, is there something else that I should be doing?