[WIP][SYSTEMDS-3907] tee ooc operator#2317
Conversation
| if (hop.getParent().size() > 1) { | ||
| for (Hop consumer : hop.getParent()) { | ||
| if (consumer instanceof TeeOp) { | ||
| if (consumer instanceof DataOp) { |
There was a problem hiding this comment.
this check only checking for DataOp.
| // Create the new TeeOp with the original hop as input | ||
| TeeOp teeOp = new TeeOp(sharedInput); | ||
| // TeeOp teeOp = new TeeOp(sharedInput); | ||
| DataOp teeOp = new DataOp("tee_out_" + sharedInput.getName(), |
There was a problem hiding this comment.
presently we are creating different variable names than the ones created automatically _mvar1 ... --> is this the correct approach?
There was a problem hiding this comment.
no worries, these hop names are just for debugging - while generating instructions, we generate the _mvar, _fvar, _var accordingly.
| DataOp teeOp = new DataOp("tee_out_" + sharedInput.getName(), | ||
| sharedInput.getDataType(), | ||
| sharedInput.getValueType(), | ||
| Types.OpOpData.TRANSIENTWRITE, |
There was a problem hiding this comment.
Is it supposed to be TRANSIENTWRITE or TRANSIENTREAD?
| import org.apache.sysds.lops.LopsException; | ||
| import org.apache.sysds.lops.Sql; |
There was a problem hiding this comment.
yes exactly - let's try to avoid the wildcard imports (to avoid ambiguity).
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2317 +/- ##
============================================
+ Coverage 72.26% 72.28% +0.01%
- Complexity 46636 46690 +54
============================================
Files 1496 1499 +3
Lines 176862 177039 +177
Branches 34767 34794 +27
============================================
+ Hits 127805 127965 +160
- Misses 39407 39420 +13
- Partials 9650 9654 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| import org.apache.sysds.lops.LopsException; | ||
| import org.apache.sysds.lops.Sql; |
There was a problem hiding this comment.
yes exactly - let's try to avoid the wildcard imports (to avoid ambiguity).
|
|
||
| private boolean _recompileRead = true; | ||
|
|
||
| private boolean _isTeeOp = false; |
There was a problem hiding this comment.
I would rather make it a op-type of DataOp like TWRITE.
| setLineNumbers(l); | ||
| setLops(l); | ||
| // setLineNumbers(l); | ||
| // setLops(l); |
There was a problem hiding this comment.
the line numbers and lops should also be set for the TeeOp
| // Create the new TeeOp with the original hop as input | ||
| TeeOp teeOp = new TeeOp(sharedInput); | ||
| // TeeOp teeOp = new TeeOp(sharedInput); | ||
| DataOp teeOp = new DataOp("tee_out_" + sharedInput.getName(), |
There was a problem hiding this comment.
no worries, these hop names are just for debugging - while generating instructions, we generate the _mvar, _fvar, _var accordingly.
| boolean isTransposeMM = isTranposePattern(hop); | ||
| boolean isMatrix = hop.getDataType() == Types.DataType.MATRIX; | ||
|
|
||
| return isOOCEnabled && multipleConsumers && isNotAlreadyTee && isTransposeMM && isMatrix; |
There was a problem hiding this comment.
it's fine to currently only do it for transpose - it will not affect other things, and later we generalize the approach.
| for (Hop parent: hop.getParent()) { | ||
| String opString = parent.getOpString(); | ||
| if (parent instanceof ReorgOp) { | ||
| if (opString.contains("r'") || opString.contains("transpose")) { |
There was a problem hiding this comment.
please always use the following utils: HopRewriteUtils.isReorgOp(parent, ReOrgOp.TRANS)
| } | ||
| } | ||
| else if (parent instanceof AggBinaryOp) | ||
| if (opString.contains("*") || opString.contains("ba+*")) { |
| String[] out = outputs.split(Lop.OPERAND_DELIMITOR); | ||
| String output2 = outputs + "_copy"; | ||
|
|
||
| // This method generates the instruction string: OOC°tee°input°output1°output2... |
There was a problem hiding this comment.
I would recommend not making tee an operation with two outputs but just one. (copying the input). After thinking a bit more about it, maybe the entire notion of a tee was not a good idea. Potentially, we could make the stream resettable, and the tee operator takes a normal stream, and outputs a resettable stream which can then be consumed by arbitrary many other operations (which also solves the problem of complex control flow interactions where the multiple parents are not immediately visible)
There was a problem hiding this comment.
thanks @mboehm7 . Are you referring to the way, where we have lightweight marker for the stream so the multiple consumers could utilize it. Please let me raise a different PR with the option.
There was a problem hiding this comment.
yes exactly - I'm thinking about something like setResettable() on the taskqueue, which means the blocks are never removed (and potentially subject to eviction) but different operations can get new iterables that always iterate the all blocks. The first operation would simply pull all blocks into memory, and the rmvar then drops all of them. If there is a single consumer we would not need to make it resettable, but can just stream them in.
implement tee ooc operator
Done:
TODO:
2. presently only, meant for t(X) %*% X, need to see if this works in general.