Enhance/krab/cluster realtime rebalance rebase#650
Closed
lixen wants to merge 54 commits into
Closed
Conversation
This commit extends AAE fullsync to sample the AAE trees to estimate the number of keys in a given partition. This estimate is used to properly size the bloom filter, as well as enable a new percentage-based direct send threshold (eg. direct send up to 10% of differing keys). To support AAE-based key estimation, this commit changes the fullsync logic to update all AAE trees before proceeding to the exchange phase. This change is necessary because all trees must be updated and sampled to calculate a correct estimate. This commit also delays the creation of the bloom filter until it is needed. Fullsyncs that manage to send all differences directly will therefore avoid creating a bloom filter. Authored-by: Mikael Lixenstrand <mikael.lixenstrand@erlang-solutions.com> Authored-by: rsltrifork <rsl@trifork.com> Rebased-by: Joseph Blomstedt <joe@basho.com>
… this needs to be followed up by appropriate changes to riak_test, doc, etc.
…imate-keys Conflicts: src/riak_repl_aae_source.erl
Keep needed for testing as debug.
Problem: transient failures of aae, such as trees not yet built or locks not
being aquired, would cause an aae fullsync process to exit abnormally. This
could happen several times in a row, creating log spam.
Resolution: the concept of soft_exit. A soft_exit is a message sent from a soon
to be exiting process to a soft_linked process. The exiting process would then
exit normally, while any soft_linked processes could handle the soft_exit
message in a similar fashion as an exit message. This would indicate an exit
reason that should be handled, but not bad enough to have the system logger
know about it.
The soft_exit message sent from the aae worker to the fscoordinator is
as simple as `{soft_exit, pid(), term()}'.
The current implementation is not generic. There can only one soft_link to
the aae, and there's no general mechanism to use soft_link's or soft_exits
elsewhere in the code base. Sorry.
Another change rolled into this is consistent use of a #partition_info record
in the fscoordinator, and error tracking the fscoordinator's state. By swapping
to useing a single data structure in the partition queue, whereis waiting list,
and purgatory queues it makes it easier to understand the fscordinator (as
there is less code modify structures).
This is a forward port of the fix done for 1.4. Conflicts favor existing code
where it does not directly effect the fix.
Conflicts:
Makefile
rebar.config
src/riak_repl2_fssource.erl
src/riak_repl2_rtq_proxy.erl
src/riak_repl_aae_source.erl
test/riak_core_cluster_mgr_tests.erl
Increment_error_dict expects the partition, elementN of error dict, and the state. It pulls the dict out of the state so it put it back in place, thus just returning the state. So this call that passed the dict in was wrong.
When a partition is not available, perhaps after a number of retries, the error exits stat should be incremented. Also, the retry exits stat should be incremented on each retry. This was discovered when backporting the repl_location_failures riak_test.
The one in riak_repl2_fssource is a legit bug in the code
…ilures-2.0 2.0 port of AAE transient FS failures Reviewed-by: lordnull
Remove loop so we can receive cancel_fullsync during update of remote trees.
A few minor bugs were discovered while investigating riak_test failures.
* The ssl application is explicitly started in
riak_core_connection:try_ssl/0. The statement in the function
expects the call to ssl:start/0 to always return ok, but in some
cases the ssl application is already started and the call returns
{error, {already_started, ssl}} instead. This should not represent
an error condition, but as written an exception is generated in this
case. This resulted in riak_test runs of replication tests that
exercise SSL to stall. Really there is no reason to attempt to start
the ssl application at this point in the code. The ssl application
is an application dependency of the riak_repl application and should
be started by the call to riak_core_util:start_app_deps in
riak_repl_app:start/2. Removing the attempt to start ssl in
riak_core_connection to avoid confusion.
* The first handle_info function clause in riak_core_connection that
handles a message received while in the wait_for_capabilities state
attempts to use SSL by calling the try_ssl/4 function. If it
succeeds a pair is returned whose elements are the name of the new
transport and a new socket for the SSL connection. However the new
socket was not being used for subsequent calls to send and setopts
and this caused failure of several riak_tests.
* The non-test function clause of
riak_core_cluster_conn:request_cluster_name/1 contained assumptions
about the transport in use and explicitly called
inet:setopts/2. This does not work when SSL is used and also caused
several test failures. The function has been changed so that the
specified transport is used for the call to setopts instead.
Address some minor bugs around establishing SSL connections Reviewed-by: engelsanchez
Improve AAE fullsync by estimating number of keys Reviewed-by: engelsanchez
…-leader Added test and fix to coord_serv not giving list for status. Reviewed-by: engelsanchez
Added last_fullsync_completed stat tracking. Reviewed-by: engelsanchez
When a partition has hit the soft exit limit, we add it to the dropped list, but forgot to remove it from the purgatory list. So it may actually be retried later.
Remove partition from purgatory when giving up Reviewed-by: lordnull
This implementes riak_core_cluster_serv {1,1}
with new membership function on the server side
to give list of {node(), {IP,Port} | unreachable}
for *all* members of remote cluster. Nodes for
which the cluster_serv cannot RPC to the given
node yield ‘unreachable’ in stead of an IP/Port.
Use new all_members message if remote is 1,1+ For 1,0 emulate new semantics by keeping old IP:PORTs around until cluster_mgr restart. Implement new seeded sort+shuffle for result of calling cluster_mgr:get_ipaddrs_of_cluster/1.
The first tells the caller the address currently connected to. The second tells the rtsource_conn to try (if possible) to switch to an alternative connection.
Possible to lose some addresses when ConnectedAddr are early in the list.
[{"127.0.0.1",10106},{"127.0.0.1",10066},{"127.0.0.1",10096},{"127.0.0.1",10076},{"127.0.0.1",10086}], ConnectedAddr: {"127.0.0.1",10066}
[{"127.0.0.1",10106}], UsefulAddrs [{"127.0.0.1",10106}]
Stats function now never returns error code.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rebase and some squash of #632 .
Will start looking at mixed cluster tests as @lordnull proposes.