Skip to content

CP-53843: reusable SM session#7002

Open
GabrielBuica wants to merge 4 commits intoxapi-project:masterfrom
GabrielBuica:private/dbuica/CP-53843
Open

CP-53843: reusable SM session#7002
GabrielBuica wants to merge 4 commits intoxapi-project:masterfrom
GabrielBuica:private/dbuica/CP-53843

Conversation

@GabrielBuica
Copy link
Copy Markdown
Contributor

@GabrielBuica GabrielBuica commented Apr 10, 2026

Builds on top of #6991

The session reuse logic in xapi_session.ml does work in this usecase,
sm calls are made with pool=false. So reimplement the logic so that we
use a single session for sm_exec per sr.

This is achieved by having a hashtable that maps SRs to a corresponding session. The session is created the first time a particular SR needs it and then get reused afterwards. The session gets recreated only if it becomes invalid.

This should help the number of database calls during congestion times.

Passes BVT and BST.

@psafont
Copy link
Copy Markdown
Member

psafont commented Apr 10, 2026

This seems like it would fix some issues I've seen in XCP-ng customers, but it's difficult to tell with the very short description, could you explain a bit further of what's the effect?

In particular, the problem is that some sessions used for long-running migrations are being evicted from the cache because of the amount of new sessions being created by SM.

@GabrielBuica GabrielBuica force-pushed the private/dbuica/CP-53843 branch from 2efe3c2 to 244b3a1 Compare April 13, 2026 14:03
@GabrielBuica GabrielBuica marked this pull request as draft April 14, 2026 07:31
@GabrielBuica
Copy link
Copy Markdown
Contributor Author

GabrielBuica commented Apr 14, 2026

Moving this to draft as I found a but when initializing the host. Updated by using a session per SR

@GabrielBuica GabrielBuica force-pushed the private/dbuica/CP-53843 branch from 244b3a1 to 4e5d580 Compare April 15, 2026 08:47
@GabrielBuica GabrielBuica marked this pull request as ready for review April 15, 2026 08:54
@psafont
Copy link
Copy Markdown
Member

psafont commented Apr 15, 2026

Are these changes meant to be backported? I've seen some storage migrations interrupted because of the amount of sessions created by the fixed code here (in XCP-ng 8.3)

session
else (
with_sm_sessions_lock (fun () -> Hashtbl.remove reusable_sessions sr_key) ;
(get_session [@tailcall]) ~__context sr
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_session doesn't need to be recursive if the logic for getting a new session is put into a function, like get_new_session __context sr`. It's easy to see how at most it gets called recursively once because the key is being removed before calling itself

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be better. Now, due to the recursive call, there is an additional hashtable lookup for no reason and the locking is awkward.

Copy link
Copy Markdown
Contributor Author

@GabrielBuica GabrielBuica Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure ebe6fe2


let rec get_session ~__context sr =
let sr_key = Option.map Ref.string_of sr in
let session = Hashtbl.find_opt reusable_sessions sr_key in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the mutex lock be used to protect both read and write access on reusable_sessions?
And it may be dead locked to acquire the lock in a recursive function?

@GabrielBuica
Copy link
Copy Markdown
Contributor Author

GabrielBuica commented Apr 15, 2026

Are these changes meant to be backported? I've seen some storage migrations interrupted because of the amount of sessions created by the fixed code here (in XCP-ng 8.3)

As soon as I get it in XS9 and I can look at getting it backported. I don't think they were changes in this area so it's probably just cherry-picking. It will still be behind a flag.

Gabriel Buica added 4 commits April 15, 2026 16:31
refactor the code before implementing reusable sessions for `sm_exec`

Signed-off-by: Gabriel Buica <danutgabriel.buica@citrix.com>
The session reuse logic in `xapi_session.ml` does work in this usecase,
sm calls are made with `pool=false`. So reimplement the logic so that we
use a single session for `sm_exec` per sr.

This should help with session count inside xapi.

Signed-off-by: Gabriel Buica <danutgabriel.buica@citrix.com>
Signed-off-by: Gabriel Buica <danutgabriel.buica@citrix.com>
@GabrielBuica GabrielBuica force-pushed the private/dbuica/CP-53843 branch from 5a3b1b9 to ebe6fe2 Compare April 15, 2026 15:32
Comment on lines +1654 to +1655
with_sm_sessions_lock (fun () -> Hashtbl.remove reusable_sessions sr_key) ;
get_new_session ~__context sr
Copy link
Copy Markdown
Member

@psafont psafont Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a concurrent execution, the removal and creation need to happen atomically, that is, happen under the same lock. That avoids creating two sessions for a single SR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but the only the first session created is being used. The others are destroyed after after. I wanted to keep the locking to a minimum (although it's locking for destroy). I was thinking of what would happen for multiple SRs rather than a single one. Even with that in this case the number of session created and destroyed would be less that what we currently have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants