Skip to content

Delayed message store crashes on startup when segment data is corrupt or truncated #1693

@viktorerlingsson

Description

@viktorerlingsson

Describe the bug
When a LavinMQ node restarts after an unclean shutdown (e.g. crash, power loss, OOM kill), the delayed message store's build_index method can crash if a segment file contains corrupt or truncated message data.

During startup, build_index calls shift? to iterate over stored messages, which internally uses BytesMessage.skip and
BytesMessage.from_bytes. If a segment was only partially flushed to disk before the crash, these methods raise an
IO::EOFError or IndexError that is not caught, causing the server to fail to start.

This is particularly problematic because the regular message store (produce_metadata) already handles this exact scenario gracefully by catching IO::EOFError/IndexError and skipping the rest of the corrupt segment — but the delayed message store was missing equivalent protection.

Impact: A LavinMQ node with delayed messages cannot restart after an unclean shutdown if any delayed message segment file is truncated or corrupt. The node crashes during boot, requiring manual intervention to repair or remove the affected segment files.

Describe your setup
LavinMQ v2.6.8

How to reproduce
Not sure how the corrupted file happened

Expected behavior
LavinMQ should start

Workaround
Delete the message file(s) that have been corrupted (should be easy to see in the logs which file is corrupted)

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions