From 462fe277b0f897141fad89a23d6739da6c5945f3 Mon Sep 17 00:00:00 2001
From: Dongjoon Hyun <dongjoon@apache.org>
Date: Tue, 5 Sep 2017 11:22:44 -0700
Subject: [PATCH 1/4] [MINOR][DOC] Add ORC in `Partition Discovery` section.

---
 docs/sql-programming-guide.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index ee231a934a3af..98a5fa70444b8 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -733,7 +733,7 @@ SELECT * FROM parquetTable
 
 Table partitioning is a common optimization approach used in systems like Hive. In a partitioned
 table, data are usually stored in different directories, with partitioning column values encoded in
-the path of each partition directory. The Parquet data source is now able to discover and infer
+the path of each partition directory. The Parquet/ORC data sources are able to discover and infer
 partitioning information automatically. For example, we can store all our previously used
 population data into a partitioned table using the following directory structure, with two extra
 columns, `gender` and `country` as partitioning columns:
@@ -762,8 +762,8 @@ path
 
 {% endhighlight %}
 
-By passing `path/to/table` to either `SparkSession.read.parquet` or `SparkSession.read.load`, Spark SQL
-will automatically extract the partitioning information from the paths.
+By passing `path/to/table` to either `SparkSession.read.parquet`, `SparkSession.read.orc`, or `SparkSession.read.load`,
+Spark SQL will automatically extract the partitioning information from the paths.
 Now the schema of the returned DataFrame becomes:
 
 {% highlight text %}
@@ -784,7 +784,7 @@ can be configured by `spark.sql.sources.partitionColumnTypeInference.enabled`, w
 
 Starting from Spark 1.6.0, partition discovery only finds partitions under the given paths
 by default. For the above example, if users pass `path/to/table/gender=male` to either
-`SparkSession.read.parquet` or `SparkSession.read.load`, `gender` will not be considered as a
+`SparkSession.read.parquet`, `SparkSession.read.orc`, or `SparkSession.read.load`, `gender` will not be considered as a
 partitioning column. If users need to specify the base path that partition discovery
 should start with, they can set `basePath` in the data source options. For example,
 when `path/to/table/gender=male` is the path of the data and

From fd00fbd108c4cc4c8effbadea99f8228bfe1a460 Mon Sep 17 00:00:00 2001
From: Dongjoon Hyun <dongjoon@apache.org>
Date: Tue, 5 Sep 2017 12:42:49 -0700
Subject: [PATCH 2/4] All built-in data source supports it.

---
 docs/sql-programming-guide.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 98a5fa70444b8..dfac53c2b37b4 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -733,7 +733,7 @@ SELECT * FROM parquetTable
 
 Table partitioning is a common optimization approach used in systems like Hive. In a partitioned
 table, data are usually stored in different directories, with partitioning column values encoded in
-the path of each partition directory. The Parquet/ORC data sources are able to discover and infer
+the path of each partition directory. All built-in data sources are able to discover and infer
 partitioning information automatically. For example, we can store all our previously used
 population data into a partitioned table using the following directory structure, with two extra
 columns, `gender` and `country` as partitioning columns:
@@ -762,8 +762,8 @@ path
 
 {% endhighlight %}
 
-By passing `path/to/table` to either `SparkSession.read.parquet`, `SparkSession.read.orc`, or `SparkSession.read.load`,
-Spark SQL will automatically extract the partitioning information from the paths.
+By passing `path/to/table` to either `SparkSession.read.parquet` or `SparkSession.read.load`, Spark SQL
+will automatically extract the partitioning information from the paths.
 Now the schema of the returned DataFrame becomes:
 
 {% highlight text %}
@@ -784,7 +784,7 @@ can be configured by `spark.sql.sources.partitionColumnTypeInference.enabled`, w
 
 Starting from Spark 1.6.0, partition discovery only finds partitions under the given paths
 by default. For the above example, if users pass `path/to/table/gender=male` to either
-`SparkSession.read.parquet`, `SparkSession.read.orc`, or `SparkSession.read.load`, `gender` will not be considered as a
+`SparkSession.read.parquet` or `SparkSession.read.load`, `gender` will not be considered as a
 partitioning column. If users need to specify the base path that partition discovery
 should start with, they can set `basePath` in the data source options. For example,
 when `path/to/table/gender=male` is the path of the data and

From 128c7790a79392a048a6808ccc7412d3fd4d1a5d Mon Sep 17 00:00:00 2001
From: Dongjoon Hyun <dongjoon@apache.org>
Date: Tue, 5 Sep 2017 13:31:54 -0700
Subject: [PATCH 3/4] Address comments.

---
 docs/sql-programming-guide.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index dfac53c2b37b4..39e088c7212a2 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -733,8 +733,9 @@ SELECT * FROM parquetTable
 
 Table partitioning is a common optimization approach used in systems like Hive. In a partitioned
 table, data are usually stored in different directories, with partitioning column values encoded in
-the path of each partition directory. All built-in data sources are able to discover and infer
-partitioning information automatically. For example, we can store all our previously used
+the path of each partition directory. All built-in data sources (including TEXT/CSV/JSON/ORC/Parquet)
+are able to discover and infer partitioning information automatically.
+For example, we can store all our previously used
 population data into a partitioned table using the following directory structure, with two extra
 columns, `gender` and `country` as partitioning columns:
 

From 018fdb381f7f9bbaed099086dd954f8ee1be2ecb Mon Sep 17 00:00:00 2001
From: Dongjoon Hyun <dongjoon@apache.org>
Date: Tue, 5 Sep 2017 14:09:37 -0700
Subject: [PATCH 4/4] Address comments.

---
 docs/sql-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 39e088c7212a2..032073bfc40dd 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -733,7 +733,7 @@ SELECT * FROM parquetTable
 
 Table partitioning is a common optimization approach used in systems like Hive. In a partitioned
 table, data are usually stored in different directories, with partitioning column values encoded in
-the path of each partition directory. All built-in data sources (including TEXT/CSV/JSON/ORC/Parquet)
+the path of each partition directory. All built-in file sources (including Text/CSV/JSON/ORC/Parquet)
 are able to discover and infer partitioning information automatically.
 For example, we can store all our previously used
 population data into a partitioned table using the following directory structure, with two extra