Bugfix: replication shouldn't break if mfiles are closed#1792
Conversation
They should still be replicatd when a message store is closed, but we must re-register them with path only for replication to work.
|
Code Review No issues found. The approach is sound: acquiring |
|
Should we |
Yeah, maybe. I think I managed to reason about it and that it wasn't necessary, but maybe better be safe than sorry... hm. |
PR ReviewBug:
|
Yeah, I think it's more unlikely to cause problems, but I think it can cause problems at least on queue delete or purge_all 🤔 |
| require "time" | ||
| require "../src/lavinmq/message_store" | ||
|
|
||
| class SpyReplicator |
There was a problem hiding this comment.
I think this needs a comment - what is this and why is it here?
| wg.try &.done # negotiation done — server is now entering files_with_hash on @mt | ||
| sha1_size = Digest::SHA1.new.digest_size | ||
| client_lz4 = Compress::LZ4::Reader.new(client_io) | ||
| 2.times do |
There was a problem hiding this comment.
I'd like a comment here as well, explaining why this runs 2 times and maybe what happens in the loop
|
Code Review No issues found. The approach is correct:
|
The |
viktorerlingsson
left a comment
There was a problem hiding this comment.
Not sure how to properly test this besides the spec, but the code looks good to me 👍
If a queue is closed because of a bad segment, the segment files are still registered in the replicator and will be replicated during a full sync. Because they are closed an error will be raised which aborts the replication. There's also a possibility that the close is happening while a checksum is being calculated which then will result in a segfault. This PR will do two things: 1. Wait for any full sync to be done before closing files 2. When mfiles are closed, they are "re-registered" in the replicator with path only to make sure the files are still being replicated. Specs should cover it
WHAT is this pull request doing?
If a queue is closed because of a bad segment, the segment files are still registered in the replicator and will be replicated during a full sync. Because they are closed an error will be raised which aborts the replication. There's also a possibility that the close is happening while a checksum is being calculated which then will result in a segfault.
This PR will do two things:
HOW can this pull request be tested?
Specs should cover it