Scale ha combined#2
Conversation
lease.go LConfig->Conf WIP SCALE HA, moved lock up 1 level ready for first review
locks imports make it compile
772a6ac to
7a2a512
Compare
|
Guys we need to finish
Ideally getting a WIP up tomorrow just to start getting feedback would be good. |
|
I think Cleanup is done addressed earlier comments, but have to copy final struct over to scheduler will do that in the am then we can open the PR up to upstream and start rebasing |
There was a problem hiding this comment.
does this change effect more than just the locking stuff?
There was a problem hiding this comment.
Yes. Before this change kube ignored the expire events from etcd. I don't know there was anything that used etcd's ttl so it probably wasn't needed before. An expire event is only when a key expires in etcd, so from kube's perspective it really should be handled like a delete. The only difference between a delete and an expire event in etcd is who initiated the key removal. For a delete, it is expressly asked by an external entity vs expire is done by etcd w/o external prompting
…duler and controller manager using the new lock API
There was a problem hiding this comment.
This should be the full process name.
…f course also an option
There was a problem hiding this comment.
Should check with decarr where is that recursive delete @...
This is the same iterative problem.
There was a problem hiding this comment.
The lock name needs to be an input option, with the default, so you need to plumb through to the cmdline stuffs
There was a problem hiding this comment.
Should it? This is the controller-manager. Is there value in having the lock name configurable for the kcm? I was thinking we would want to hard-code the lock name in the processes so there's no chance of a configuration snaffu leading to issues
|
(last thing i did friday, just remembered to add these notes here !) i got some testing done on some on a cluster (you can do this by modifying vagrant scripts to launch 2 masters). I can add the code tomorrow for that. but ...failover didn't seem to happen. monday
|
… times are updated properly and so on.
444e116 to
60a3e8a
Compare
|
interesting. i tested the renewtime and it seems to properly add a TTL to the etcd entries. but i cant see where? |
…here is an error around startup after lock acquisition
…, a mod to local-up-cluster which starts 2 kcms.
There was a problem hiding this comment.
FYI, I was on the fence about keeping these changes... but they allow you to test HA in local mode by inducing failover. Iim happy to including them in the final PR, unless folks think its a bad idea. For now while this is WIP its good to have these for regression testing as we change the code base and especially before we rebase.
The way it works: Just tail the logs of controller-manager-1.log and controller-manager-2.log, and you'll see the handoff happen after the time bomb goes off.
@timothysc @rrati merged everything to one branch am testing now.
just FYI any thoughts feel free to put in here !