Skip to content

Galera 26.4.21#32

Merged
sysprg merged 28 commits into
mariadb-4.xfrom
galera-26.4.21
Dec 18, 2024
Merged

Galera 26.4.21#32
sysprg merged 28 commits into
mariadb-4.xfrom
galera-26.4.21

Conversation

@janlindstrom
Copy link
Copy Markdown

No description provided.

ayurchen and others added 28 commits August 11, 2024 01:24
…ructs

gcs, gcs_core, gcs_group instead of using calloc()
 - make gcs_register_params() take gu::Coonfig& and return void in line with
other such methods
…before aborting.

Remove remainig calls to gu_abort() in gcs_group.cc. Instead marshall
errors up to the main recv loop where graceful leave will be attempted
before abort.
The header `openssl/engine.h` is required to compile `gu_asio_test.cpp`.
However, the engine functionality is being deprecated on some
platforms, which makes galerautils ASIO specific unit test compilation
to fail. The `openssl/engine.h` is not required by any other
part of Galera code.

Fix by removing `openssl/engine.h` include from `gu_asio_test.cpp` and
fixing compilation errors by including `openssl/err.h`.
Before the GALERA_GIT_REVISION was written under PROJECT_SOURCE_DIR,
but this breaks with out-of-source builds if the source directory
is read-only. Write GALERA_GIT_REVISION under CMAKE_BINARY_DIR instead.
The warning was emitted from `gu_atomic.hpp`

    galerautils/tests/../src/gu_atomic.hpp:22:19:
       error: template-id not allowed for constructor in C++20
       [-Werror=template-id-cdtor]
   22 |         Atomic<I>(I i = 0) : i_(i) { }
      |                   ^
/galera/galerautils/tests/../src/gu_atomic.hpp:22:19: note: remove the ‘< >’

Fix by removing unnecessary template parameter from the constructor.
…fails during IST

It likely to happen that IST donor locks seqno of already released action.
In fact it is very likely since only uncommitted actions are not released.
As of now `GCache::seqno_lock()` affects only results of top-level
`GCache::release()`, `GCache::discard_seqno()` and `GCache::discard_tail()`
calls and in the case of GCache::release() it is almost pointless as
mentioned above: a large part of "locked" actions already has been released.
At the same time underlying data stores (like the `RingBuffer` store) are
unaware of the locked seqno and may discard released actions which are
supposed to be protected by the lock in response to e.g. malloc() requests.
Pass locked seqno down to stores to inform them of discard limits.
… not allow documented parameters

For some reason ssl_cipher parameter type was type_bool
when correct type is string. Fixed parameter type
and added test case.
…d in commit cut

When a node progresses from JOINED to SYNCED there is a race between
it becoming counted in commit cut recalculation (it needs to report last
applied that is greater than the current commit cut) and some other node
andvancing the commit cut even further. Make sure that the node is counted
in commit cut recalculation as soon as it announces itself SYNCED and
don't advance the commit cut until the node reports last applied greater
than the current commit cut.

Bump GCS protocol version (makes rolling downgrade impossible)
Expose versions of all protocols agreed on by quorum via status call.
@janlindstrom janlindstrom self-assigned this Nov 27, 2024
Copy link
Copy Markdown
Member

@grooverdan grooverdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should break the API again.

Also from:
https://buildbot.mariadb.org/#/grid?branch=refs%2Fpull%2F32%2Fmerge
Fedora build error still there - MDEV-35108 - something quite messy with rpmbuild.

minor:
Retriggering the debian-sid-x86_32 builder didn't trigger the out of space (lack of 1MB on storage), though the pattern of available_storage checking storage before using is ultimately a race condition. Error handling in ::write/::prealloc aren't as verbose in terms of size as the available_storage and both are missing the filename.

The debiansid build however shows quite a few string format warnings where the format string doesn't match the provided argument. The server put quite a bit of effort into fixing the same error.
https://buildbot.mariadb.org/#/builders/252/builds/4149/steps/4/logs/stdio

Comment thread gcs/src/gcs_act_proto.hpp
Comment on lines +29 to +32
* 5 - fix for commit cut tracking for just SYNCED nodes
* (must keep it identical on all nodes)
*/
#define GCS_PROTO_MAX 4
#define GCS_PROTO_MAX 5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is creating another incompatibility

Enforced at
gcs/src/gcs_act_proto.cpp: if (gu_unlikely(GCS_PROTO_MAX < version)) return -EPROTONOSUPPORT;

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to know when all nodes in a cluster at upgrade can use new way for calculating commit cut tracking. This commit cut tracking fix is necessary to fix rolling-upgrade bug identified at upgrade testing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is because all nodes independently calculate this commit cut and a discrepancy from an old version will result in that node behaving in a way the other node don't expect?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@janlindstrom
Copy link
Copy Markdown
Author

I do not think x86_32 is anymore supported platform.

seqno_locked_count++;

if (seqno_g < seqno_locked) seqno_locked = seqno_g;
if (seqno_g < seqno_locked)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the assumption that multiple threads are calling this, there isn't memory barriers or mutexes to ensure the order of operations here.

Also applies to other operations on seqno_{un,}lock or accessing seqno_g directly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 300 there is mutex gu::Lock lock(mtx);.

Comment thread gcs/src/gcs_core.cpp
if (GCS_ACT_INCONSISTENCY != recv_act->act.type) {
/* inconsistency event must be passed up */
gu_fatal("Unrecoverable error happened above. Aborting...");
usleep(1000000); // give it a second
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why waiting for a second before aborting? Is something going to clean up? If so something less time based would be better.

From: 3ae0dda

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but this node will go non-Primary i.e. out of cluster and I think gu_abort() below will produce core dump.

Comment thread cmake/version.cmake
#
# First determine GALERA_GIT_REVISION. If it is stored into file
# in source root, the value is taken from there. Otherwise
# in build dir root, the value is taken from there. Otherwise
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I'm very happy to see this change.

Comment thread gcs/src/gcs_act_proto.hpp
Comment on lines +29 to +32
* 5 - fix for commit cut tracking for just SYNCED nodes
* (must keep it identical on all nodes)
*/
#define GCS_PROTO_MAX 4
#define GCS_PROTO_MAX 5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is because all nodes independently calculate this commit cut and a discrepancy from an old version will result in that node behaving in a way the other node don't expect?

grooverdan added a commit to grooverdan/mariadb-buildbot that referenced this pull request Dec 17, 2024
@grooverdan
Copy link
Copy Markdown
Member

I do not think x86_32 is anymore supported platform.

The string format issues are easily fixable. I can do a patch if you want.

Apart from that the good use of types in galera means its in a good shape.

I'm looking at adding it CI MariaDB/buildbot@b5093e0 to keep it in shape, even if not officially supported.

@janlindstrom
Copy link
Copy Markdown
Author

I do not have suitable machine where I could test string format changes so I'm open for contribution.

@sysprg sysprg merged commit 0e3c3ca into mariadb-4.x Dec 18, 2024
@grooverdan grooverdan deleted the galera-26.4.21 branch December 18, 2024 03:22
@grooverdan grooverdan mentioned this pull request Dec 18, 2024
grooverdan added a commit to grooverdan/mariadb-buildbot that referenced this pull request Dec 20, 2024
@janlindstrom
Copy link
Copy Markdown
Author

@grooverdan Format warnings will be fixed on 26.4.22

grooverdan added a commit to grooverdan/mariadb-buildbot that referenced this pull request Jul 9, 2025
grooverdan added a commit to grooverdan/mariadb-buildbot that referenced this pull request Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants