[SYSTEMDS-3857] set get names on dataframes#2495
Conversation
This patch adds the language references for the newly implemented getName and setName function. The order in Builtins.java was fixed to be alphabetical again
janniklinde
left a comment
There was a problem hiding this comment.
Thanks for the PR @t99-i. I left some comments in the code that should be addressed.
In general, please configure your IDE to use tabs instead of spaces. Also, please remove the unnecessary TODO comments.
As a next step, please implement support for the spark backend and add the according tests. In general, you should have a more systematic approach to verifying metatdata propagation through tests as you currently only include cbind/rbind/slice for the CP backend (should be extended to spark backend and should test other frame related functions systematically).
| case SET_NAMES: | ||
| currBuiltinOp = new BinaryOp( | ||
| target.getName(), | ||
| target.getDataType(), | ||
| target.getValueType(), | ||
| OpOp2.SET_COLNAMES, expr, expr2 | ||
| ); | ||
| break; |
There was a problem hiding this comment.
All of the above cases will now be mapped to a BinaryOp of type SET_COLNAMES. This is incorrect and the reason why the tests fail (see for example: https://github.com/apache/systemds/actions/runs/27905840894/job/82663839671?pr=2495#step:3:3857).
| else if( opcode.equalsIgnoreCase(Opcodes.FREPLICATE.toString())) | ||
| return new BinaryOperator(Builtin.getBuiltinFnObject("freplicate")); | ||
| else if( opcode.equalsIgnoreCase(Opcodes.SET_COLNAMES.toString())) | ||
| return new BinaryOperator(Builtin.getBuiltinFnObject("set_colnames")); |
There was a problem hiding this comment.
Does this return a proper function object currently?
| String[] colNames = new String[(int) names.getNumColumns()]; | ||
| for(int i = 0; i < colNames.length; i++){ | ||
| colNames[i] = names.get(0, i).toString(); | ||
| } |
There was a problem hiding this comment.
No size/data validation happening
| output.setDataType(DataType.FRAME); | ||
| output.setDimensions(id.getDim1(), id.getDim2()); | ||
| output.setBlocksize (id.getBlocksize()); | ||
| output.setValueType(ValueType.STRING); |
There was a problem hiding this comment.
setName does not have STRING as a return type
| //TODO: Check if new OPcode handling has to be implemented | ||
| else if(getOpcode().equals(Opcodes.COLNAMES.toString())) { | ||
| FrameBlock inBlock = ec.getFrameInput(input1.getName()); | ||
| FrameBlock retBlock = inBlock.getColumnNamesAsFrame(); | ||
| ec.releaseFrameInput(input1.getName()); | ||
| ec.setFrameOutput(output.getName(), retBlock); | ||
| } | ||
|
|
There was a problem hiding this comment.
Duplicate code, not needed.
| addTestConfiguration(TEST_NAME_GET, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME_SET, new String[] {"B"})); | ||
| addTestConfiguration(TEST_NAME_SET, new TestConfiguration(TEST_CLASS_DIR, TEST_NAME_GET, new String[] {"B"})); |
| names.set(0, i, columnNames[i]); | ||
| FrameWriter nameWriter = FrameWriterFactory.createFrameWriter(FileFormat.CSV, | ||
| new FileFormatPropertiesCSV(false, ",", false)); | ||
| System.out.println("N path = " + input("N")); |
There was a problem hiding this comment.
Please avoid those prints
| private final static String TEST_NAME_RBIND = "ColNameRbindPropagation"; | ||
| private final static String TEST_NAME_SLICE = "ColNameSlicePropagation"; | ||
| private final static String TEST_DIR = "functions/frame/"; | ||
| private static final String TEST_CLASS_DIR = TEST_DIR + FrameColumnNamesTest.class.getSimpleName() + "/"; |
| else if ( opcode.equalsIgnoreCase(Opcodes.VALUESWAP.toString()) || opcode.equalsIgnoreCase("mapValueSwap") ) | ||
| return new BinaryOperator(Builtin.getBuiltinFnObject("valueSwap")); | ||
| //TODO: Check what "|| opcode.equalsIgnoreCase("mapValueSwap"))" does | ||
| else if (opcode.equalsIgnoreCase(Opcodes.SET_COLNAMES.toString()) || opcode.equalsIgnoreCase("mapValueSwap")) |
There was a problem hiding this comment.
Should not be mapValueSwap
This PR adds the frame operations getNames and setNames.
Changes:
Tests:
The propagation tests verify that column names are preserved correctly during frame operations. The covered operations are cbind, rbind, and slice.
The tests use frame dimensions ranging from 10 to 1000 columns.
Note:
I observed unusual column-name propagation behavior for cbind on wider frames around 3000 columns. I left this case out of the regression tests to keep the PR focused on getNames/setNames and the CP runtime scope.