-
Notifications
You must be signed in to change notification settings - Fork 54
Description
When a message segment file contains corrupt data, MessageStore correctly detects the error during initialization and calls close to mark the store as closed. However, two problems prevent graceful handling:
- Execution continues after close — in load_segments_from_disk, the corrupt file is still added to
@segmentsand the constructor keeps running. In load_stats_from_segments, the loop continues iterating over segments that may have been closed. - With a replicator (clustering), close spawns a fiber to close MFiles asynchronously (message_store.cr:263-270). This fiber captures
@segmentsby reference. At the next Fiber.yield, the fiber runs and closes MFiles that were added after the close call — causing an unhandled IO::Error: Closed mfile that crashes the entire server.
This affects any type of segment corruption detected during startup — invalid schema version, OverflowError, FrameDecode, etc. Without clustering (no replicator), the close path is synchronous and happens to work because @segments is empty at the time.
To reproduce:
- Create a queue and publish a message on a clustered LavinMQ instance
- Stop LavinMQ
- Write junk data into a segment file (e.g. echo -n "abcd" | dd of=msgs.0000000001 conv=notrunc)
- Start LavinMQ — server crashes instead of closing just the affected queue
Expected behavior: The queue should be marked as closed, the server should continue running, and the queue can be restarted via the API.
Root cause: MessageStore#close (message_store.cr:263) checks if replicator = @replicator and takes an async path that spawns a fiber, even when no followers are connected (startup). The fix should: (a) make close synchronous when there are no followers, (b) return from load_segments_from_disk after close, and (c) skip remaining initialization when @closed is true. The same pattern applies to produce_metadata which also calls close mid-iteration in load_stats_from_segments.
This affects v2.7.0-rc.1