Galera 26.4.21#32
Conversation
…ructs gcs, gcs_core, gcs_group instead of using calloc() - make gcs_register_params() take gu::Coonfig& and return void in line with other such methods
…before aborting. Remove remainig calls to gu_abort() in gcs_group.cc. Instead marshall errors up to the main recv loop where graceful leave will be attempted before abort.
The header `openssl/engine.h` is required to compile `gu_asio_test.cpp`. However, the engine functionality is being deprecated on some platforms, which makes galerautils ASIO specific unit test compilation to fail. The `openssl/engine.h` is not required by any other part of Galera code. Fix by removing `openssl/engine.h` include from `gu_asio_test.cpp` and fixing compilation errors by including `openssl/err.h`.
…openssl-engine-dependency
Before the GALERA_GIT_REVISION was written under PROJECT_SOURCE_DIR, but this breaks with out-of-source builds if the source directory is read-only. Write GALERA_GIT_REVISION under CMAKE_BINARY_DIR instead.
The warning was emitted from `gu_atomic.hpp`
galerautils/tests/../src/gu_atomic.hpp:22:19:
error: template-id not allowed for constructor in C++20
[-Werror=template-id-cdtor]
22 | Atomic<I>(I i = 0) : i_(i) { }
| ^
/galera/galerautils/tests/../src/gu_atomic.hpp:22:19: note: remove the ‘< >’
Fix by removing unnecessary template parameter from the constructor.
…fails during IST It likely to happen that IST donor locks seqno of already released action. In fact it is very likely since only uncommitted actions are not released. As of now `GCache::seqno_lock()` affects only results of top-level `GCache::release()`, `GCache::discard_seqno()` and `GCache::discard_tail()` calls and in the case of GCache::release() it is almost pointless as mentioned above: a large part of "locked" actions already has been released. At the same time underlying data stores (like the `RingBuffer` store) are unaware of the locked seqno and may discard released actions which are supposed to be protected by the lock in response to e.g. malloc() requests. Pass locked seqno down to stores to inform them of discard limits.
Refs mariadb-corporation/codership-galera#915 Writeset serialization fails during IST
Gcache seqno lock 4.ee
… not allow documented parameters For some reason ssl_cipher parameter type was type_bool when correct type is string. Fixed parameter type and added test case.
Refs mariadb-corporation/codership-galera#1006 socket.ssl_cipher does not allow docu…
4.x ee ssl cipher fix
…d in commit cut When a node progresses from JOINED to SYNCED there is a race between it becoming counted in commit cut recalculation (it needs to report last applied that is greater than the current commit cut) and some other node andvancing the commit cut even further. Make sure that the node is counted in commit cut recalculation as soon as it announces itself SYNCED and don't advance the commit cut until the node reports last applied greater than the current commit cut. Bump GCS protocol version (makes rolling downgrade impossible)
Expose versions of all protocols agreed on by quorum via status call.
grooverdan
left a comment
There was a problem hiding this comment.
I don't think we should break the API again.
Also from:
https://buildbot.mariadb.org/#/grid?branch=refs%2Fpull%2F32%2Fmerge
Fedora build error still there - MDEV-35108 - something quite messy with rpmbuild.
minor:
Retriggering the debian-sid-x86_32 builder didn't trigger the out of space (lack of 1MB on storage), though the pattern of available_storage checking storage before using is ultimately a race condition. Error handling in ::write/::prealloc aren't as verbose in terms of size as the available_storage and both are missing the filename.
The debiansid build however shows quite a few string format warnings where the format string doesn't match the provided argument. The server put quite a bit of effort into fixing the same error.
https://buildbot.mariadb.org/#/builders/252/builds/4149/steps/4/logs/stdio
| * 5 - fix for commit cut tracking for just SYNCED nodes | ||
| * (must keep it identical on all nodes) | ||
| */ | ||
| #define GCS_PROTO_MAX 4 | ||
| #define GCS_PROTO_MAX 5 |
There was a problem hiding this comment.
so this is creating another incompatibility
Enforced at
gcs/src/gcs_act_proto.cpp: if (gu_unlikely(GCS_PROTO_MAX < version)) return -EPROTONOSUPPORT;
There was a problem hiding this comment.
This is necessary to know when all nodes in a cluster at upgrade can use new way for calculating commit cut tracking. This commit cut tracking fix is necessary to fix rolling-upgrade bug identified at upgrade testing.
There was a problem hiding this comment.
So this is because all nodes independently calculate this commit cut and a discrepancy from an old version will result in that node behaving in a way the other node don't expect?
|
I do not think x86_32 is anymore supported platform. |
| seqno_locked_count++; | ||
|
|
||
| if (seqno_g < seqno_locked) seqno_locked = seqno_g; | ||
| if (seqno_g < seqno_locked) |
There was a problem hiding this comment.
On the assumption that multiple threads are calling this, there isn't memory barriers or mutexes to ensure the order of operations here.
Also applies to other operations on seqno_{un,}lock or accessing seqno_g directly.
There was a problem hiding this comment.
See line 300 there is mutex gu::Lock lock(mtx);.
| if (GCS_ACT_INCONSISTENCY != recv_act->act.type) { | ||
| /* inconsistency event must be passed up */ | ||
| gu_fatal("Unrecoverable error happened above. Aborting..."); | ||
| usleep(1000000); // give it a second |
There was a problem hiding this comment.
why waiting for a second before aborting? Is something going to clean up? If so something less time based would be better.
From: 3ae0dda
There was a problem hiding this comment.
I'm not sure but this node will go non-Primary i.e. out of cluster and I think gu_abort() below will produce core dump.
| # | ||
| # First determine GALERA_GIT_REVISION. If it is stored into file | ||
| # in source root, the value is taken from there. Otherwise | ||
| # in build dir root, the value is taken from there. Otherwise |
There was a problem hiding this comment.
btw I'm very happy to see this change.
| * 5 - fix for commit cut tracking for just SYNCED nodes | ||
| * (must keep it identical on all nodes) | ||
| */ | ||
| #define GCS_PROTO_MAX 4 | ||
| #define GCS_PROTO_MAX 5 |
There was a problem hiding this comment.
So this is because all nodes independently calculate this commit cut and a discrepancy from an old version will result in that node behaving in a way the other node don't expect?
Without it could be going out of support MariaDB/galera#32 (comment)
The string format issues are easily fixable. I can do a patch if you want. Apart from that the good use of types in galera means its in a good shape. I'm looking at adding it CI MariaDB/buildbot@b5093e0 to keep it in shape, even if not officially supported. |
|
I do not have suitable machine where I could test string format changes so I'm open for contribution. |
Without it could be going out of support MariaDB/galera#32 (comment)
|
@grooverdan Format warnings will be fixed on 26.4.22 |
Without it could be going out of support MariaDB/galera#32 (comment)
Without it could be going out of support MariaDB/galera#32 (comment)
No description provided.