You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The existing CP WriteInstruction (VariableCPInstruction) was modified to be "stream-aware."
It now checks if its input MatrixObject has an OOC stream handle. If a stream exists, it acts as a synchronous terminal consumer, reading blocks from the stream and writing them to separate part-files in an output directory.
Multi-Block Write: The OOC write logic was made robust to handle multi-block matrices by writing to separate part-files, which is the standard for distributed systems.
Sorry for the delay, and thanks for getting started on this task @j143.
Integration: Instead of integrating this write into the VariableCPInstruction (where non-binary formats are written), I would recommend to integrate the OOC write into the individual writers (with support for only binary) with a new method which is called if an MatrixObject has indeed an existing OOC stream of blocks.
Core Write Logic: In order to yield the same output files as a normal (single-threaded) write, I recommend to not create part files for every single block, but stream all these blocks into a single file. Once you extend the binary write you see that this approach is even easier and result in files that can be processed much faster (not too many files which can be an issue on distributed file systems).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The existing CP WriteInstruction (VariableCPInstruction) was modified to be "stream-aware."
It now checks if its input MatrixObject has an OOC stream handle. If a stream exists, it acts as a synchronous terminal consumer, reading blocks from the stream and writing them to separate part-files in an output directory.
Multi-Block Write: The OOC write logic was made robust to handle multi-block matrices by writing to separate part-files, which is the standard for distributed systems.