-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Describe the bug
When a LavinMQ node restarts after an unclean shutdown (e.g. crash, power loss, OOM kill), the delayed message store's build_index method can crash if a segment file contains corrupt or truncated message data.
During startup, build_index calls shift? to iterate over stored messages, which internally uses BytesMessage.skip and
BytesMessage.from_bytes. If a segment was only partially flushed to disk before the crash, these methods raise an
IO::EOFError or IndexError that is not caught, causing the server to fail to start.
This is particularly problematic because the regular message store (produce_metadata) already handles this exact scenario gracefully by catching IO::EOFError/IndexError and skipping the rest of the corrupt segment — but the delayed message store was missing equivalent protection.
Impact: A LavinMQ node with delayed messages cannot restart after an unclean shutdown if any delayed message segment file is truncated or corrupt. The node crashes during boot, requiring manual intervention to repair or remove the affected segment files.
Describe your setup
LavinMQ v2.6.8
How to reproduce
Not sure how the corrupted file happened
Expected behavior
LavinMQ should start
Workaround
Delete the message file(s) that have been corrupted (should be easy to see in the logs which file is corrupted)