[SPARK-7186] [SQL] Decouple internal Row from external Row#6792
Conversation
|
Thanks for working on this huge change! Should |
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateUtilsSuite.scala sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala sql/core/src/main/scala/org/apache/spark/sql/execution/pythonUdfs.scala sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala sql/core/src/main/scala/org/apache/spark/sql/sources/DataSourceStrategy.scala sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnStatsSuite.scala sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnarTestUtils.scala
|
How come Jenkins didn't print any meaning messages? cc @JoshRosen |
There was a problem hiding this comment.
why not just import InternalRow?
There was a problem hiding this comment.
This is done by Intellj.
There was a problem hiding this comment.
Okay, but we aren't even consistent in our usage. Can we remove this and just reference InternalRow?
There was a problem hiding this comment.
+1 i think we should fix this
There was a problem hiding this comment.
Will do in a follow up PR. Please continue review this.
|
Jenkins, retest this please. |
|
I looked at the Jenkins configuration log and it looks like @andrewor14's credentials somehow auto-filled and overwrote the Jenkins GitHub token; Andrew and I were modifying the builder configurations this afternoon to attach I've rolled back the configuration change so hopefully we'll see SparkQA posting soon. For those with Jenkins admin access, see https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/jobConfigHistory/showDiffFiles?timestamp1=2015-06-12_09-30-32×tamp2=2015-06-12_14-37-42 |
|
Test build #34816 has finished for PR 6792 at commit
|
|
Going to merge this quickly since it conflicts with a lot of other patches. |
There was a problem hiding this comment.
not needed, will remove it.
|
Another nit: |
Currently, we use o.a.s.sql.Row both internally and externally. The external interface is wider than what the internal needs because it is designed to facilitate end-user programming. This design has proven to be very error prone and cumbersome for internal Row implementations. As a first step, we create an InternalRow interface in the catalyst module, which is identical to the current Row interface. And we switch all internal operators/expressions to use this InternalRow instead. When we need to expose Row, we convert the InternalRow implementation into Row for users. For all public API, we use Row (for example, data source APIs), which will be converted into/from InternalRow by CatalystTypeConverters. For all internal data sources (Json, Parquet, JDBC, Hive), we use InternalRow for better performance, casted into Row in buildScan() (without change the public API). When create a PhysicalRDD, we cast them back to InternalRow. cc rxin marmbrus JoshRosen Author: Davies Liu <davies@databricks.com> Closes apache#6792 from davies/internal_row and squashes the following commits: f2abd13 [Davies Liu] fix scalastyle a7e025c [Davies Liu] move InternalRow into catalyst 30db8ba [Davies Liu] Merge branch 'master' of github.com:apache/spark into internal_row 7cbced8 [Davies Liu] separate Row and InternalRow
Currently, we use o.a.s.sql.Row both internally and externally. The external interface is wider than what the internal needs because it is designed to facilitate end-user programming. This design has proven to be very error prone and cumbersome for internal Row implementations.
As a first step, we create an InternalRow interface in the catalyst module, which is identical to the current Row interface. And we switch all internal operators/expressions to use this InternalRow instead. When we need to expose Row, we convert the InternalRow implementation into Row for users.
For all public API, we use Row (for example, data source APIs), which will be converted into/from InternalRow by CatalystTypeConverters.
For all internal data sources (Json, Parquet, JDBC, Hive), we use InternalRow for better performance, casted into Row in buildScan() (without change the public API). When create a PhysicalRDD, we cast them back to InternalRow.
cc @rxin @marmbrus @JoshRosen