L0:

L1:

L2:

I downloaded the checkpoint from Hugging Face
The L2 model is extremely unstable when running inference under FP16 AMP: the model outputs NaNs. This does not happen with L0/L1. I inspected the activations and found extremely large values.
I also inspected XL0 and XL1. They are even worse, extremely large activation values appear throughout the model, not just in LiteMLA.
Input image
image_8.tif
L0:L1:L2:I downloaded the checkpoint from Hugging Face
The
L2model is extremely unstable when running inference under FP16 AMP: the model outputs NaNs. This does not happen with L0/L1. I inspected the activations and found extremely large values.I also inspected
XL0andXL1. They are even worse, extremely large activation values appear throughout the model, not just in LiteMLA.Input image
image_8.tif