Skip to content

[SPARK-15994] [MESOS] Allow enabling Mesos fetch cache in coarse executor backend#13713

Closed
drcrallen wants to merge 10 commits into
apache:masterfrom
metamx:SPARK15994
Closed

[SPARK-15994] [MESOS] Allow enabling Mesos fetch cache in coarse executor backend#13713
drcrallen wants to merge 10 commits into
apache:masterfrom
metamx:SPARK15994

Conversation

@drcrallen

@drcrallen drcrallen commented Jun 16, 2016

Copy link
Copy Markdown
Contributor

Mesos 0.23.0 introduces a Fetch Cache feature http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of resources specified in command URIs.

This patch:

  • Updates the Mesos shaded protobuf dependency to 0.23.0
  • Allows setting spark.mesos.fetcherCache.enable to enable the fetch cache for all specified URIs. (URIs must be specified for the setting to have any affect)
  • Updates documentation for Mesos configuration with the new setting.

This patch does NOT:

  • Allow for per-URI caching configuration. The cache setting is global to ALL URIs for the command.

Mesos 0.23.0 introduces a Fetch Cache feature http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of resources specified in command URIs.

This patch:

* Updates the Mesos shaded protobuf dependency to 0.23.0
* Allows setting `spark.mesos.fetchCache.enable` to enable the fetch cache for all specified URIs. (URIs must be specified for the setting to have any affect)
* Updates documentation for Mesos configuration with the new setting.

This patch does NOT:

* Allow for per-URI caching configuration. The cache setting is global to ALL URIs for the command.
@tnachen

tnachen commented Jun 22, 2016

Copy link
Copy Markdown
Contributor

ok to test

@SparkQA

SparkQA commented Jun 22, 2016

Copy link
Copy Markdown

Test build #60998 has finished for PR 13713 at commit 1ebfaf6.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@drcrallen

Copy link
Copy Markdown
Contributor Author

@tnachen any comments?

@SparkQA

SparkQA commented Jul 27, 2016

Copy link
Copy Markdown

Test build #62931 has finished for PR 13713 at commit a9b735c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen

tnachen commented Jul 27, 2016

Copy link
Copy Markdown
Contributor

We also fetch URIs for running drivers in cluster mode (MesosClusterScheduler.scala). I'm thinking we should also allow this configuration to effect that too.

@SparkQA

SparkQA commented Jul 30, 2016

Copy link
Copy Markdown

Test build #63032 has finished for PR 13713 at commit 974245f.

  • This patch fails build dependency tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@drcrallen

Copy link
Copy Markdown
Contributor Author

OoooOOO master updated to 1.0.0

Fixed merge conflicts

@drcrallen

Copy link
Copy Markdown
Contributor Author

@tnachen this test is a little larger than I originally anticipated. Let me see if I can add some unit tests

@SparkQA

SparkQA commented Jul 30, 2016

Copy link
Copy Markdown

Test build #63046 has finished for PR 13713 at commit bad9d69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Serializable

@tnachen

tnachen commented Oct 8, 2016

Copy link
Copy Markdown
Contributor

@drcrallen Are you still planning to update this? It's quite a useful feature, so hoping this can get in. Also since Fine grain mode is depcreated I don't think we need to update it too.

@drcrallen

Copy link
Copy Markdown
Contributor Author

I can update this today.

@drcrallen

Copy link
Copy Markdown
Contributor Author

I'm still adding a basic test for this.

@SparkQA

SparkQA commented Oct 10, 2016

Copy link
Copy Markdown

Test build #66688 has finished for PR 13713 at commit 456d28f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@drcrallen

Copy link
Copy Markdown
Contributor Author

Tested with mvn test -Pmesos -pl mesos -Dtest=MesosCoarseGrainedSchedulerBackendSuite

@drcrallen

Copy link
Copy Markdown
Contributor Author

@tnachen let me know if there's any outstanding issues here.

assert(launchedTasks.head.getCommand.getUrisList.asScala(0).getValue == url)
}

test("mesos supports setting fetcher") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/supports setting fetcher/supports setting fetcher cache/g

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread docs/running-on-mesos.md Outdated
<td><code>spark.mesos.fetchCache.enable</code></td>
<td><code>false</code></td>
<td>
If set to `true`, all URIs in `spark.mesos.uris` will be eligible for caching by the [Mesos fetch cache](http://mesos.apache.org/documentation/latest/fetcher/)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implementation you actually set all downloadable URIs (like spark.executor.uri, jarUrl, etc) to be fetcher cachable. I think we need to be more explicit here that it's more than just spark.mesos.uris

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more info.

@tnachen

tnachen commented Oct 10, 2016

Copy link
Copy Markdown
Contributor

Other than the 2 comments, the changes LGTM. @mgummelt @srowen

@SparkQA

SparkQA commented Oct 10, 2016

Copy link
Copy Markdown

Test build #66690 has finished for PR 13713 at commit 97c0d30.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 11, 2016

Copy link
Copy Markdown

Test build #66703 has finished for PR 13713 at commit 54ae86b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Comment thread docs/running-on-mesos.md Outdated
<td><code>spark.mesos.fetchCache.enable</code></td>
<td><code>false</code></td>
<td>
If set to `true`, all URIs (example: `spark.executor.uri`, `spark.mesos.uris`) will be eligible for caching by the [Mesos fetch cache](http://mesos.apache.org/documentation/latest/fetcher/)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fetch/fetcher

s/eligible for caching/cached (they're not just eligible, they are in fact cached)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed both

Comment thread docs/running-on-mesos.md Outdated


<tr>
<td><code>spark.mesos.fetchCache.enable</code></td>

@mgummelt mgummelt Oct 11, 2016

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fetchCache/fetcherCache (et al)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

private val queuedCapacity = conf.getInt("spark.mesos.maxDrivers", 200)
private val retainedDrivers = conf.getInt("spark.mesos.retainedDrivers", 200)
private val maxRetryWaitTime = conf.getInt("spark.mesos.cluster.retry.wait.max", 60) // 1 minute
private val useFetchCache = conf.getBoolean("spark.mesos.fetchCache.enable", false)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to factor out all these conf vars like YARN does at some point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea but outside scope of this PR

), false)
val offers = List(Resources(backend.executorMemory(sc), 1))
offerResources(offers)
// Don't crash!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it originally didn't do much except check that it didn't fail. I have since added proper checking of the task info for if the fetcher cache was enabled. Will remove


private[mesos] val mesosExecutorCores = sc.conf.getDouble("spark.mesos.mesosExecutor.cores", 1)

private val useFetchCache = sc.conf.getBoolean("spark.mesos.fetchCache.enable", false)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove changes to the fine grained scheduler. It's deprecated, and we shouldn't be adding any new features to it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@SparkQA

SparkQA commented Oct 12, 2016

Copy link
Copy Markdown

Test build #66850 has finished for PR 13713 at commit cc3901e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 12, 2016

Copy link
Copy Markdown

Test build #66852 has finished for PR 13713 at commit fc0802c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

useFetchCache: Boolean = false): Unit = {
uris.split(",").foreach { uri =>
builder.addUris(CommandInfo.URI.newBuilder().setValue(uri.trim()))
builder.addUris(CommandInfo.URI.newBuilder().setValue(uri.trim()).setCache(useFetchCache))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fetch/fetcher

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@mgummelt

Copy link
Copy Markdown

LGTM @srowen

@SparkQA

SparkQA commented Oct 13, 2016

Copy link
Copy Markdown

Test build #66909 has finished for PR 13713 at commit 7195c6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@drcrallen

Copy link
Copy Markdown
Contributor Author

@mgummelt do you know what still needs to be done to get this in?

@mgummelt

Copy link
Copy Markdown

We need to get @srowen or one of the other committers to merge it.

@srowen

srowen commented Nov 1, 2016

Copy link
Copy Markdown
Member

Merged to master

@asfgit asfgit closed this in e34b4e1 Nov 1, 2016
@drcrallen drcrallen deleted the SPARK15994 branch November 1, 2016 18:03
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…tor backend

Mesos 0.23.0 introduces a Fetch Cache feature http://mesos.apache.org/documentation/latest/fetcher/ which allows caching of resources specified in command URIs.

This patch:
- Updates the Mesos shaded protobuf dependency to 0.23.0
- Allows setting `spark.mesos.fetcherCache.enable` to enable the fetch cache for all specified URIs. (URIs must be specified for the setting to have any affect)
- Updates documentation for Mesos configuration with the new setting.

This patch does NOT:
- Allow for per-URI caching configuration. The cache setting is global to ALL URIs for the command.

Author: Charles Allen <charles@allen-net.com>

Closes apache#13713 from drcrallen/SPARK15994.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants