[SPARK-52598][DOCS] Reorganize Spark Connect programming guide#51305
[SPARK-52598][DOCS] Reorganize Spark Connect programming guide#51305nchammas wants to merge 2 commits into
Conversation
| In a terminal window, go to the `spark` folder in the location where you extracted | ||
| Spark before and run the `start-connect-server.sh` script to start Spark server with | ||
| Spark Connect, like in this example: |
There was a problem hiding this comment.
Can we add more instructions on how to use it with SPARK_HOME?
There was a problem hiding this comment.
Sure, I'll take a crack at that.
Just FYI, these instructions were moved from the existing Spark Connect Overview page.
| In a terminal window, go to the `spark` folder in the location where you extracted | ||
| Spark before and run the `start-connect-server.sh` script to start Spark server with | ||
| Spark Connect. If you already have Spark installed and `SPARK_HOME` defined, you can use that too. | ||
|
|
||
| ```bash | ||
| cd spark/ | ||
| ./sbin/start-connect-server.sh | ||
|
|
||
| # alternately | ||
| "$SPARK_HOME/sbin/start-connect-server.sh" | ||
| ``` |
There was a problem hiding this comment.
@allisonwang-db - Is this what you were looking for?
There was a problem hiding this comment.
yea not sure if we can point to this doc: https://spark.apache.org/docs/latest/api/python/getting_started/install.html#manually-downloading
There was a problem hiding this comment.
I don't mind adding that as a link, but I think it's a bit confusing to jump at this point from the main narrative documentation at the root of the site to this parallel set of documentation under api/python/. I know this is a larger problem with the documentation that we have discussed in the past.
There was a problem hiding this comment.
Referencing the PySpark doc from the main spark doc is indeed less ideal. Do we have any other installation docs we can link here?
There was a problem hiding this comment.
I believe the general installation instructions are on this page.
|
@grundprinzip - After this reorg of the existing Connect documentation is merged in, I am planning to add a new page with narrative documentation for tools that Spark Connect clients would find useful, as we briefly discussed here. |
|
@grundprinzip / @allisonwang-db - Are you interested in this minor refactoring or shall I abandon it? This change is by itself pretty minor, but it's meant to enable us to add and organize more narrative documentation for Spark Connect. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR reorganizes the narrative Spark Connect documentation into a guide that matches the pattern we are already using elsewhere in the docs for the DataFrame API, Structured Streaming, and so forth.
It adds a new entry in the "Programming Guides" dropdown for Spark Connect, and reorganizes the existing two Spark Connect pages into three:
spark-connect-overview.htmlspark-connect-setup.htmlspark-connect-server-libs.htmlThis is what the reorganized guide looks like:
Why are the changes needed?
The prose currently in Application Development with Spark Connect is partly repetitive of what's in the overview, and the overview itself a bit longer than necessary because it mixes a genuine introduction to Spark Connect with a technical guide on how to set it up.
With this information reorganized a bit, it should be a bit clearer to map out and follow, and it facilitates adding more narrative Spark Connect documentation since we now have a dedicated guide with its own left sidebar.
In a future PR, I intend to add a new page dedicated to the client side of working with Spark Connect, which will mirror the existing page we have for the server side.
Does this PR introduce any user-facing change?
Documentation only.
How was this patch tested?
I built the docs locally and reviewed them in my browser.
Was this patch authored or co-authored using generative AI tooling?
No.