CP-53843: reusable SM session#7002
Conversation
|
This seems like it would fix some issues I've seen in XCP-ng customers, but it's difficult to tell with the very short description, could you explain a bit further of what's the effect? In particular, the problem is that some sessions used for long-running migrations are being evicted from the cache because of the amount of new sessions being created by SM. |
2efe3c2 to
244b3a1
Compare
|
|
244b3a1 to
4e5d580
Compare
|
Are these changes meant to be backported? I've seen some storage migrations interrupted because of the amount of sessions created by the fixed code here (in XCP-ng 8.3) |
ocaml/xapi/sm_exec.ml
Outdated
| session | ||
| else ( | ||
| with_sm_sessions_lock (fun () -> Hashtbl.remove reusable_sessions sr_key) ; | ||
| (get_session [@tailcall]) ~__context sr |
There was a problem hiding this comment.
get_session doesn't need to be recursive if the logic for getting a new session is put into a function, like get_new_session __context sr`. It's easy to see how at most it gets called recursively once because the key is being removed before calling itself
There was a problem hiding this comment.
I think that would be better. Now, due to the recursive call, there is an additional hashtable lookup for no reason and the locking is awkward.
ocaml/xapi/sm_exec.ml
Outdated
|
|
||
| let rec get_session ~__context sr = | ||
| let sr_key = Option.map Ref.string_of sr in | ||
| let session = Hashtbl.find_opt reusable_sessions sr_key in |
There was a problem hiding this comment.
Should the mutex lock be used to protect both read and write access on reusable_sessions?
And it may be dead locked to acquire the lock in a recursive function?
As soon as I get it in XS9 and I can look at getting it backported. I don't think they were changes in this area so it's probably just cherry-picking. It will still be behind a flag. |
refactor the code before implementing reusable sessions for `sm_exec` Signed-off-by: Gabriel Buica <danutgabriel.buica@citrix.com>
The session reuse logic in `xapi_session.ml` does work in this usecase, sm calls are made with `pool=false`. So reimplement the logic so that we use a single session for `sm_exec` per sr. This should help with session count inside xapi. Signed-off-by: Gabriel Buica <danutgabriel.buica@citrix.com>
Signed-off-by: Gabriel Buica <danutgabriel.buica@citrix.com>
5a3b1b9 to
ebe6fe2
Compare
| with_sm_sessions_lock (fun () -> Hashtbl.remove reusable_sessions sr_key) ; | ||
| get_new_session ~__context sr |
There was a problem hiding this comment.
In a concurrent execution, the removal and creation need to happen atomically, that is, happen under the same lock. That avoids creating two sessions for a single SR.
There was a problem hiding this comment.
Yeah, but the only the first session created is being used. The others are destroyed after after. I wanted to keep the locking to a minimum (although it's locking for destroy). I was thinking of what would happen for multiple SRs rather than a single one. Even with that in this case the number of session created and destroyed would be less that what we currently have.
Builds on top of #6991
The session reuse logic in
xapi_session.mldoes work in this usecase,sm calls are made with
pool=false. So reimplement the logic so that weuse a single session for
sm_execper sr.This is achieved by having a hashtable that maps SRs to a corresponding session. The session is created the first time a particular SR needs it and then get reused afterwards. The session gets recreated only if it becomes invalid.
This should help the number of database calls during congestion times.
Passes BVT and BST.