From 0b3e765867f4f3ed33954bd31b62d602e9d28832 Mon Sep 17 00:00:00 2001 From: Huaxin Gao Date: Thu, 28 May 2020 22:06:10 -0700 Subject: [PATCH 1/5] [SPARK-31333][SQL][DOCS][FOLLOW-UP] Add Coalesce/Repartition/Repartition_By_Range Hints to SQL REF --- docs/_data/menu-sql.yaml | 4 +-- docs/sql-performance-tuning.md | 2 +- docs/sql-ref-syntax-qry-select-hints.md | 39 ++++++++++++++++++++++--- docs/sql-ref-syntax-qry-select-join.md | 2 +- docs/sql-ref-syntax-qry-select.md | 2 +- docs/sql-ref-syntax-qry.md | 2 +- docs/sql-ref-syntax.md | 2 +- 7 files changed, 42 insertions(+), 11 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 289a9d3a1e9da..fbecfa7ae3631 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -171,12 +171,12 @@ url: sql-ref-syntax-qry-select-limit.html - text: Common Table Expression url: sql-ref-syntax-qry-select-cte.html + - text: Hints + url: sql-ref-syntax-qry-select-hints.html - text: Inline Table url: sql-ref-syntax-qry-select-inline-table.html - text: JOIN url: sql-ref-syntax-qry-select-join.html - - text: Join Hints - url: sql-ref-syntax-qry-select-hints.html - text: LIKE Predicate url: sql-ref-syntax-qry-select-like.html - text: Set Operators diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md index 7cd85b6a9ab4c..5a818ec40c234 100644 --- a/docs/sql-performance-tuning.md +++ b/docs/sql-performance-tuning.md @@ -179,7 +179,7 @@ SELECT /*+ BROADCAST(r) */ * FROM records r JOIN src s ON r.key = s.key -For more details please refer to the documentation of [Join Hints](sql-ref-syntax-qry-select-hints.html). +For more details please refer to the documentation of [Hints](sql-ref-syntax-qry-select-hints.html). ## Coalesce Hints for SQL Queries diff --git a/docs/sql-ref-syntax-qry-select-hints.md b/docs/sql-ref-syntax-qry-select-hints.md index 4bb48b08d5e3b..3c6efd5cef5d5 100644 --- a/docs/sql-ref-syntax-qry-select-hints.md +++ b/docs/sql-ref-syntax-qry-select-hints.md @@ -1,7 +1,7 @@ --- layout: global -title: Join Hints -displayTitle: Join Hints +title: Hints +displayTitle: Hints license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -21,14 +21,45 @@ license: | ### Description -Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint. +Hints give users a way to suggest how Spark SQL to use specific approaches to generate its execution plan. ### Syntax ```sql -/*+ join_hint [ , ... ] */ +/*+ hint [ , ... ] */ +``` + +### Coalesce/Repartition/Repartition_By_Range Hints + +Coalesce/Repartition/Repartition_By_Range hints have functionalities equivalent to those of the +`Dataset` coalesce/repartition/repartitionByRange APIs. The Coalesce hint can be used to reduce +the number of partitions to the specified number of partitions. The Repartition/Repartition_By_Range +hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. +The Coalesce hint takes a partition number as a +parameter. The Repartition hint takes a partition number, column names, or both as parameters. +The Repartition_By_Range hint takes column names and an optional partition number as parameters. +These hints give users a way to tune performance and control the number of output files in Spark SQL. + +### Examples +```sql +SELECT /*+ COALESCE(3) */ * FROM t; + +SELECT /*+ REPARTITION(3) */ * FROM t; + +SELECT /*+ REPARTITION(c) */ * FROM t; + +SELECT /*+ REPARTITION(3, c) */ * FROM t; + +SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t; + +SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t; ``` + +### Join Hints + +Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint. + ### Join Hints Types * **BROADCAST** diff --git a/docs/sql-ref-syntax-qry-select-join.md b/docs/sql-ref-syntax-qry-select-join.md index 28b21f5e3f0ff..09b0efd7b5751 100644 --- a/docs/sql-ref-syntax-qry-select-join.md +++ b/docs/sql-ref-syntax-qry-select-join.md @@ -235,4 +235,4 @@ SELECT * FROM employee ANTI JOIN department ON employee.deptno = department.dept ### Related Statements * [SELECT](sql-ref-syntax-qry-select.html) -* [Join Hints](sql-ref-syntax-qry-select-hints.html) +* [Hints](sql-ref-syntax-qry-select-hints.html) diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 1aeecdb982c4c..3776d45323e14 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -151,9 +151,9 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) * [Common Table Expression](sql-ref-syntax-qry-select-cte.html) +* [Hints](sql-ref-syntax-qry-select-hints.html) * [Inline Table](sql-ref-syntax-qry-select-inline-table.html) * [JOIN](sql-ref-syntax-qry-select-join.html) -* [Join Hints](sql-ref-syntax-qry-select-hints.html) * [LIKE Predicate](sql-ref-syntax-qry-select-like.html) * [Set Operators](sql-ref-syntax-qry-select-setops.html) * [TABLESAMPLE](sql-ref-syntax-qry-sampling.html) diff --git a/docs/sql-ref-syntax-qry.md b/docs/sql-ref-syntax-qry.md index 8accdfe30764a..3d3d0846e8ead 100644 --- a/docs/sql-ref-syntax-qry.md +++ b/docs/sql-ref-syntax-qry.md @@ -37,9 +37,9 @@ ability to generate logical and physical plan for a given query using * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) * [Common Table Expression](sql-ref-syntax-qry-select-cte.html) + * [Hints](sql-ref-syntax-qry-select-hints.html) * [Inline Table](sql-ref-syntax-qry-select-inline-table.html) * [JOIN](sql-ref-syntax-qry-select-join.html) - * [Join Hints](sql-ref-syntax-qry-select-hints.html) * [LIKE Predicate](sql-ref-syntax-qry-select-like.html) * [Set Operators](sql-ref-syntax-qry-select-setops.html) * [TABLESAMPLE](sql-ref-syntax-qry-sampling.html) diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md index 98e30652245f6..ac1174458c6a5 100644 --- a/docs/sql-ref-syntax.md +++ b/docs/sql-ref-syntax.md @@ -54,9 +54,9 @@ Spark SQL is Apache Spark's module for working with structured data. The SQL Syn * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html) * [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html) * [HAVING Clause](sql-ref-syntax-qry-select-having.html) + * [Hints](sql-ref-syntax-qry-select-hints.html) * [Inline Table](sql-ref-syntax-qry-select-inline-table.html) * [JOIN](sql-ref-syntax-qry-select-join.html) - * [Join Hints](sql-ref-syntax-qry-select-hints.html) * [LIKE Predicate](sql-ref-syntax-qry-select-like.html) * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html) * [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) From 60fdb93800a2843268ed21501ee7f26950949411 Mon Sep 17 00:00:00 2001 From: Huaxin Gao Date: Fri, 29 May 2020 00:29:11 -0700 Subject: [PATCH 2/5] address comments --- docs/sql-ref-syntax-qry-select-hints.md | 38 ++++++++++++++++++++----- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/docs/sql-ref-syntax-qry-select-hints.md b/docs/sql-ref-syntax-qry-select-hints.md index 3c6efd5cef5d5..21e608193b08e 100644 --- a/docs/sql-ref-syntax-qry-select-hints.md +++ b/docs/sql-ref-syntax-qry-select-hints.md @@ -29,30 +29,54 @@ Hints give users a way to suggest how Spark SQL to use specific approaches to ge /*+ hint [ , ... ] */ ``` -### Coalesce/Repartition/Repartition_By_Range Hints +### Partitioning Hints -Coalesce/Repartition/Repartition_By_Range hints have functionalities equivalent to those of the -`Dataset` coalesce/repartition/repartitionByRange APIs. The Coalesce hint can be used to reduce -the number of partitions to the specified number of partitions. The Repartition/Repartition_By_Range +`COALESCE`/`REPARTITION`/`REPARTITION_BY_RANGE` hints have functionalities equivalent to those of the +`Dataset` `coalesce`/`repartition`/`repartitionByRange` APIs. The `COALESCE` hint can be used to reduce +the number of partitions to the specified number of partitions. The `REPARTITION`/`REPARTITION_BY_RANGE` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. -The Coalesce hint takes a partition number as a -parameter. The Repartition hint takes a partition number, column names, or both as parameters. -The Repartition_By_Range hint takes column names and an optional partition number as parameters. +The `COALESCE` hint takes a partition number as a +parameter. The `REPARTITION` hint takes a partition number, column names, or both as parameters. +The `REPARTITION_BY_RANGE` hint takes column names and an optional partition number as parameters. These hints give users a way to tune performance and control the number of output files in Spark SQL. ### Examples ```sql SELECT /*+ COALESCE(3) */ * FROM t; +EXPLAIN SELECT /*+ COALESCE(3) */ * FROM t; +== Physical Plan == +Coalesce 3 ++- *(1) ColumnarToRow + +- FileScan parquet default.t[name#5,c#6] Batched: true, DataFilters: [], Format: Parquet, + Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [], + PushedFilters: [], ReadSchema: struct + SELECT /*+ REPARTITION(3) */ * FROM t; SELECT /*+ REPARTITION(c) */ * FROM t; SELECT /*+ REPARTITION(3, c) */ * FROM t; +EXPLAIN SELECT /*+ REPARTITION(3, c) */ * FROM t; +== Physical Plan == +Exchange hashpartitioning(c#6, 3), false, [id=#148] ++- *(1) ColumnarToRow + +- FileScan parquet default.t[name#5,c#6] Batched: true, DataFilters: [], Format: Parquet, + Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [], + PushedFilters: [], ReadSchema: struct + SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t; SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t; + +EXPLAIN SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t; +== Physical Plan == +Exchange rangepartitioning(c#6 ASC NULLS FIRST, 3), false, [id=#167] ++- *(1) ColumnarToRow + +- FileScan parquet default.t[name#5,c#6] Batched: true, DataFilters: [], Format: Parquet, + Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [], + PushedFilters: [], ReadSchema: struct ``` From 8a7fa0911030ccc154f157006997ef4fd5862e5d Mon Sep 17 00:00:00 2001 From: Huaxin Gao Date: Fri, 29 May 2020 09:39:44 -0700 Subject: [PATCH 3/5] address comments --- docs/sql-performance-tuning.md | 4 +- docs/sql-ref-syntax-qry-select-hints.md | 75 +++++++++++++++---------- 2 files changed, 49 insertions(+), 30 deletions(-) diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md index 5a818ec40c234..5e6f049a51e95 100644 --- a/docs/sql-performance-tuning.md +++ b/docs/sql-performance-tuning.md @@ -179,7 +179,7 @@ SELECT /*+ BROADCAST(r) */ * FROM records r JOIN src s ON r.key = s.key -For more details please refer to the documentation of [Hints](sql-ref-syntax-qry-select-hints.html). +For more details please refer to the documentation of [Join Hints](sql-ref-syntax-qry-select-hints.html#join-hints). ## Coalesce Hints for SQL Queries @@ -196,6 +196,8 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t +For more details please refer to the documentation of [Partitioning Hints](sql-ref-syntax-qry-select-hints.html#partitioning-hints). + ## Adaptive Query Execution Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that makes use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. As of Spark 3.0, there are three major features in AQE, including coalescing post-shuffle partitions, converting sort-merge join to broadcast join, and skew join optimization. diff --git a/docs/sql-ref-syntax-qry-select-hints.md b/docs/sql-ref-syntax-qry-select-hints.md index 21e608193b08e..6c9eb8503cdbb 100644 --- a/docs/sql-ref-syntax-qry-select-hints.md +++ b/docs/sql-ref-syntax-qry-select-hints.md @@ -31,58 +31,75 @@ Hints give users a way to suggest how Spark SQL to use specific approaches to ge ### Partitioning Hints -`COALESCE`/`REPARTITION`/`REPARTITION_BY_RANGE` hints have functionalities equivalent to those of the -`Dataset` `coalesce`/`repartition`/`repartitionByRange` APIs. The `COALESCE` hint can be used to reduce -the number of partitions to the specified number of partitions. The `REPARTITION`/`REPARTITION_BY_RANGE` -hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. -The `COALESCE` hint takes a partition number as a -parameter. The `REPARTITION` hint takes a partition number, column names, or both as parameters. -The `REPARTITION_BY_RANGE` hint takes column names and an optional partition number as parameters. -These hints give users a way to tune performance and control the number of output files in Spark SQL. +Partitioning hints allow users to suggest a partitioning stragety that Spark should follow. `COALESCE`, `REPARTITION`, +and `REPARTITION_BY_RANGE` hints are supported and are equivalent to `coalesce`, `repartition`, and +`repartitionByRange` [Dataset APIs](api/scala/org/apache/spark/sql/Dataset.html), respectively. These hints give users +a way to tune performance and control the number of output files in Spark SQL. When multiple partitioning hints are +specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer. + +### Partitioning Hints Types + +* **COALESCE** + + The `COALESCE` hint can be used to reduce the number of partitions to the specified number of partitions. It takes a partition number as a parameter. + +* **REPARTITION** + + The `REPARTITION` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes a partition number, column names, or both as parameters. + +* **REPARTITION_BY_RANGE** + + The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes column names and an optional partition number as parameters. + ### Examples ```sql SELECT /*+ COALESCE(3) */ * FROM t; -EXPLAIN SELECT /*+ COALESCE(3) */ * FROM t; -== Physical Plan == -Coalesce 3 -+- *(1) ColumnarToRow - +- FileScan parquet default.t[name#5,c#6] Batched: true, DataFilters: [], Format: Parquet, - Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [], - PushedFilters: [], ReadSchema: struct - SELECT /*+ REPARTITION(3) */ * FROM t; SELECT /*+ REPARTITION(c) */ * FROM t; SELECT /*+ REPARTITION(3, c) */ * FROM t; -EXPLAIN SELECT /*+ REPARTITION(3, c) */ * FROM t; -== Physical Plan == -Exchange hashpartitioning(c#6, 3), false, [id=#148] -+- *(1) ColumnarToRow - +- FileScan parquet default.t[name#5,c#6] Batched: true, DataFilters: [], Format: Parquet, - Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [], - PushedFilters: [], ReadSchema: struct - SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t; SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t; -EXPLAIN SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t; +-- When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, +-- but the leftmost hint is picked by the optimizer. +EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), REPARTITION_BY_RANGE(3, c) */ * FROM t; +== Parsed Logical Plan == +'UnresolvedHint REPARTITION, [100] ++- 'UnresolvedHint COALESCE, [500] + +- 'UnresolvedHint REPARTITION_BY_RANGE, [3, 'c] + +- 'Project [*] + +- 'UnresolvedRelation [t] + +== Analyzed Logical Plan == +name: string, c: int +Repartition 100, true ++- Repartition 500, false + +- RepartitionByExpression [c#30 ASC NULLS FIRST], 3 + +- Project [name#29, c#30] + +- SubqueryAlias spark_catalog.default.t + +- Relation[name#29,c#30] parquet + +== Optimized Logical Plan == +Repartition 100, true ++- Relation[name#29,c#30] parquet + == Physical Plan == -Exchange rangepartitioning(c#6 ASC NULLS FIRST, 3), false, [id=#167] +Exchange RoundRobinPartitioning(100), false, [id=#121] +- *(1) ColumnarToRow - +- FileScan parquet default.t[name#5,c#6] Batched: true, DataFilters: [], Format: Parquet, + +- FileScan parquet default.t[name#29,c#30] Batched: true, DataFilters: [], Format: Parquet, Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` - ### Join Hints -Join Hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint. +Join hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint. ### Join Hints Types From 7f97fe37eee7e98a7add83161466b498fe2331b7 Mon Sep 17 00:00:00 2001 From: Huaxin Gao Date: Fri, 29 May 2020 12:27:29 -0700 Subject: [PATCH 4/5] rename sampling and window function file names --- docs/_data/menu-sql.yaml | 4 ++-- ...-qry-sampling.md => sql-ref-syntax-qry-select-sampling.md} | 0 ...ntax-qry-window.md => sql-ref-syntax-qry-select-window.md} | 0 docs/sql-ref-syntax-qry-select.md | 4 ++-- docs/sql-ref-syntax-qry.md | 4 ++-- docs/sql-ref-syntax.md | 4 ++-- 6 files changed, 8 insertions(+), 8 deletions(-) rename docs/{sql-ref-syntax-qry-sampling.md => sql-ref-syntax-qry-select-sampling.md} (100%) rename docs/{sql-ref-syntax-qry-window.md => sql-ref-syntax-qry-select-window.md} (100%) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index fbecfa7ae3631..219e6809a96f0 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -182,11 +182,11 @@ - text: Set Operators url: sql-ref-syntax-qry-select-setops.html - text: TABLESAMPLE - url: sql-ref-syntax-qry-sampling.html + url: sql-ref-syntax-qry-select-sampling.html - text: Table-valued Function url: sql-ref-syntax-qry-select-tvf.html - text: Window Function - url: sql-ref-syntax-qry-window.html + url: sql-ref-syntax-qry-select-window.html - text: EXPLAIN url: sql-ref-syntax-qry-explain.html - text: Auxiliary Statements diff --git a/docs/sql-ref-syntax-qry-sampling.md b/docs/sql-ref-syntax-qry-select-sampling.md similarity index 100% rename from docs/sql-ref-syntax-qry-sampling.md rename to docs/sql-ref-syntax-qry-select-sampling.md diff --git a/docs/sql-ref-syntax-qry-window.md b/docs/sql-ref-syntax-qry-select-window.md similarity index 100% rename from docs/sql-ref-syntax-qry-window.md rename to docs/sql-ref-syntax-qry-select-window.md diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md index 3776d45323e14..987e6479ab20a 100644 --- a/docs/sql-ref-syntax-qry-select.md +++ b/docs/sql-ref-syntax-qry-select.md @@ -156,6 +156,6 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } * [JOIN](sql-ref-syntax-qry-select-join.html) * [LIKE Predicate](sql-ref-syntax-qry-select-like.html) * [Set Operators](sql-ref-syntax-qry-select-setops.html) -* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html) +* [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html) * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html) -* [Window Function](sql-ref-syntax-qry-window.html) +* [Window Function](sql-ref-syntax-qry-select-window.html) diff --git a/docs/sql-ref-syntax-qry.md b/docs/sql-ref-syntax-qry.md index 3d3d0846e8ead..167c394d0fe49 100644 --- a/docs/sql-ref-syntax-qry.md +++ b/docs/sql-ref-syntax-qry.md @@ -42,7 +42,7 @@ ability to generate logical and physical plan for a given query using * [JOIN](sql-ref-syntax-qry-select-join.html) * [LIKE Predicate](sql-ref-syntax-qry-select-like.html) * [Set Operators](sql-ref-syntax-qry-select-setops.html) - * [TABLESAMPLE](sql-ref-syntax-qry-sampling.html) + * [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html) * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html) - * [Window Function](sql-ref-syntax-qry-window.html) + * [Window Function](sql-ref-syntax-qry-select-window.html) * [EXPLAIN Statement](sql-ref-syntax-qry-explain.html) diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md index ac1174458c6a5..d78a01fd655a2 100644 --- a/docs/sql-ref-syntax.md +++ b/docs/sql-ref-syntax.md @@ -62,10 +62,10 @@ Spark SQL is Apache Spark's module for working with structured data. The SQL Syn * [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html) * [Set Operators](sql-ref-syntax-qry-select-setops.html) * [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html) - * [TABLESAMPLE](sql-ref-syntax-qry-sampling.html) + * [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html) * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html) * [WHERE Clause](sql-ref-syntax-qry-select-where.html) - * [Window Function](sql-ref-syntax-qry-window.html) + * [Window Function](sql-ref-syntax-qry-select-window.html) * [EXPLAIN](sql-ref-syntax-qry-explain.html) ### Auxiliary Statements From bc4fdfc06fa944fd1c3e2205d0b107856c04554c Mon Sep 17 00:00:00 2001 From: Huaxin Gao Date: Fri, 29 May 2020 18:37:36 -0700 Subject: [PATCH 5/5] address comments --- docs/sql-ref-syntax-qry-select-hints.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/docs/sql-ref-syntax-qry-select-hints.md b/docs/sql-ref-syntax-qry-select-hints.md index 6c9eb8503cdbb..247ce48e79445 100644 --- a/docs/sql-ref-syntax-qry-select-hints.md +++ b/docs/sql-ref-syntax-qry-select-hints.md @@ -37,7 +37,7 @@ and `REPARTITION_BY_RANGE` hints are supported and are equivalent to `coalesce`, a way to tune performance and control the number of output files in Spark SQL. When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer. -### Partitioning Hints Types +#### Partitioning Hints Types * **COALESCE** @@ -51,8 +51,8 @@ specified, multiple nodes are inserted into the logical plan, but the leftmost h The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes column names and an optional partition number as parameters. +#### Examples -### Examples ```sql SELECT /*+ COALESCE(3) */ * FROM t; @@ -66,8 +66,7 @@ SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t; SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t; --- When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, --- but the leftmost hint is picked by the optimizer. +-- multiple partitioning hints EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), REPARTITION_BY_RANGE(3, c) */ * FROM t; == Parsed Logical Plan == 'UnresolvedHint REPARTITION, [100] @@ -101,7 +100,7 @@ Exchange RoundRobinPartitioning(100), false, [id=#121] Join hints allow users to suggest the join strategy that Spark should use. Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, `SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. When different join strategy hints are specified on both sides of a join, Spark prioritizes hints in the following order: `BROADCAST` over `MERGE` over `SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side based on the join type and the sizes of the relations. Since a given strategy may not support all join types, Spark is not guaranteed to use the join strategy suggested by the hint. -### Join Hints Types +#### Join Hints Types * **BROADCAST** @@ -119,7 +118,7 @@ Join hints allow users to suggest the join strategy that Spark should use. Prior Suggests that Spark use shuffle-and-replicate nested loop join. -### Examples +#### Examples ```sql -- Join Hints for broadcast join