Add per-node chain-finish cleanup functions + empty cuda cache only on chain finish#2852
Add per-node chain-finish cleanup functions + empty cuda cache only on chain finish#2852joeyballentine merged 6 commits intomainfrom
Conversation
"Hopefully"? Did... did you not measure whether your optimization actually made things faster? And on that note: if you do an optimization, please also report how much faster this actually makes things. As a reviewer, I also have to weigh whether the speed-up is worth the tradeoff in code simplicity/quality/maintainability. I can of course measure things myself, but having you give a reference case/number is useful.
I would suggest the following API: we have a def do_clean_up(): ...
@register(...)
def my_node(context: NodeContext):
# run once per node execution
context.add_cleanup(lambda: do_clean_up())
# run once per chain
context.add_cleanup(do_clean_up)This fixes the following issue with the current
|
I did, after I made this PR, but that was with all of my changes together. I'll test this individually. |
|
Turns out this is doing most of the heavy lifting for the optimizations I tested. a 2m20s processing time went down to 2m01s with just this change alone. |
RunDevelopment
left a comment
There was a problem hiding this comment.
Just some minor things.
Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
Co-authored-by: Michael Schmidt <mitchi5000.ms@googlemail.com>
|
Probably needs an |
After doing some profiling, I noticed that clearing the cuda cache was taking significant time to do. And, this was happening after every upscale, which i'm pretty sure is incorrect and bad. We only need to release the memory when we're actually done with it (aka when the chain finishes).
So, I added a hook system to packages. Not sure if this is the best api for it, but packages can now define on_chain_finish callbacks to be run when the chain is finished, to do things like resource clearing.
This will hopefully bring a little speed improvement to batch upscales