remove cvk_command_combine in favor of cvk_event_combine#526
Conversation
54a89af to
fb87413
Compare
|
It can feel like a lot of change, but most of it is just code being moved around. It is passing the full CTS on Nvidia and Swiftshader on my side. |
fb87413 to
076caaf
Compare
076caaf to
e3605c0
Compare
ed5185e to
3cfd81b
Compare
3cfd81b to
3c009e5
Compare
0972333 to
f7a7040
Compare
f7a7040 to
0b52ee6
Compare
|
I have updated the CL following #776 which brought merge conflicts. |
|
I've started looking into this and ran a CTS sweep. I'm seeing the following regressions:
|
0b52ee6 to
3b06622
Compare
That one should be fixed now.
I'm not able to reproduce, could you share what's going wrong? |
| m_end_event->register_callback(callback_type, ptr, user_data); | ||
| } else { | ||
| m_start_event->register_callback(callback_type, ptr, user_data); | ||
| } |
There was a problem hiding this comment.
Shouldn't the callback always be registered on the end event? A combined event isn't queued or submitted until all its events are, is it?
There was a problem hiding this comment.
A combined event isn't queued or submitted until all its events are, is it?
There is nothing saying commands from combined event are submitted together. In fact, with timeline semaphore there are times where they are not submitted together.
| if (pinfo == CL_PROFILING_COMMAND_END) { | ||
| return m_end_event->get_profiling_info(pinfo); | ||
| } else { | ||
| return m_start_event->get_profiling_info(pinfo); |
There was a problem hiding this comment.
Hmm, this and some of the other comments above reveal a fundamental flaw of the event combine approach IMO.
I think we always want CL_PROFILING_COMMAND_START to come from the start event. Otherwise we can't report the time it took to run all the commands.
If the submit timestamp comes from the start event, then a combined event is reported as submitted before all the work is. If the submit timestamp comes from the end event, then it could be smaller than the timestamp for the start event which would break ordering. Not sure how to resolve this.
Also, CL_PROFILING_COMMAND_COMPLETE should definitely come from the end event.
There was a problem hiding this comment.
CL_PROFILING_COMMAND_COMPLETE does not exist inside clvk I believe. It is map to CL_PROFILING_COMMAND_END here.
If the submit timestamp comes from the start event, then a combined event is reported as submitted before all the work is. If the submit timestamp comes from the end event, then it could be smaller than the timestamp for the start event which would break ordering. Not sure how to resolve this.
This is quite a complex problem. I think it is more important to have an efficient implementation than to have something making sure a group is submitted together just for the sake of being able to advertise the correct timestamp. It is exactly the point of this PR, which makes a great improvement performance-wise (combine with timeline semaphore & command buffer, both of which rely on this PR).
The assumption in that PR is that we used the first command of the combine for QUEUED, SUBMIT & START and the last command for the END.
There was a problem hiding this comment.
I'd forgotten about the remapping of CL_PROFILING_COMMAND_COMPLETE. As discussed, there is no perfect solution to this problem. Let's go with that PR as is. I think it's the approach that's the most helpful to users.
cvk_command_combine prevents optimization where commands could be batch. It is needed to have a single event representing the execution of the group of commands. This PR creates a new event (cvk_event_combine) which allows to have a single event grouping several events, thus keeping the single event for the group of commands. It also allows to make do_action a private function, thus having it called only in one place. This is very convenient to implement future optimization (using VkSemaphore).
3b06622 to
97b0fd1
Compare
kpet
left a comment
There was a problem hiding this comment.
Feels good to get that one in. Apologies it took such a long time.
| if (pinfo == CL_PROFILING_COMMAND_END) { | ||
| return m_end_event->get_profiling_info(pinfo); | ||
| } else { | ||
| return m_start_event->get_profiling_info(pinfo); |
There was a problem hiding this comment.
I'd forgotten about the remapping of CL_PROFILING_COMMAND_COMPLETE. As discussed, there is no perfect solution to this problem. Let's go with that PR as is. I think it's the approach that's the most helpful to users.
cvk_command_combine prevents optimization where commands could be batch.
It is needed to have a single event representing the execution of the group of commands. This PR creates a new event (cvk_event_combine) which allows to have a single event grouping several events, thus keeping the single event for the group of commands.
It also allows to make do_action a private function, thus having it called only in one place. This is very convenient to implement future optimization (using VkSemaphore).