Skip to content

[SPARK-6053][MLLIB] support save/load in PySpark's ALS#4811

Closed
mengxr wants to merge 4 commits into
apache:masterfrom
mengxr:SPARK-5991
Closed

[SPARK-6053][MLLIB] support save/load in PySpark's ALS#4811
mengxr wants to merge 4 commits into
apache:masterfrom
mengxr:SPARK-5991

Conversation

@mengxr

@mengxr mengxr commented Feb 27, 2015

Copy link
Copy Markdown
Contributor

A simple wrapper to save/load MatrixFactorizationModel in Python. @jkbradley

@SparkQA

SparkQA commented Feb 27, 2015

Copy link
Copy Markdown

Test build #28056 has started for PR 4811 at commit 282ec8d.

  • This patch merges cleanly.

@SparkQA

SparkQA commented Feb 27, 2015

Copy link
Copy Markdown

Test build #28056 has finished for PR 4811 at commit 282ec8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MatrixFactorizationModel(JavaModelWrapper, Saveable, JavaLoader):
    • class Saveable(object):
    • class Loader(object):
    • class JavaLoader(Loader):
    • java_class = ".".join([java_package, cls.__name__])

@AmplabJenkins

Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28056/
Test PASSed.

@jkbradley

Copy link
Copy Markdown
Member

I messed up not passing sc to save/load. Is this patch going into 1.3? If not, then I'll submit a separate patch fixing the documentation (which will conflict a little).

@mengxr

mengxr commented Feb 27, 2015

Copy link
Copy Markdown
Contributor Author

If we have couple days before RC2, this would be nice to have. We use the same API as in Scala/Java and there is no real implementation in this PR. Having save/load would benefit many users.

@SparkQA

SparkQA commented Feb 27, 2015

Copy link
Copy Markdown

Test build #28088 has started for PR 4811 at commit 06140a4.

  • This patch merges cleanly.

Comment thread docs/mllib-collaborative-filtering.md Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add sc to save call.
Also import MatrixFactorizationModel

@SparkQA

SparkQA commented Feb 27, 2015

Copy link
Copy Markdown

Test build #28088 has finished for PR 4811 at commit 06140a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MatrixFactorizationModel(JavaModelWrapper, Saveable, JavaLoader):
    • class Saveable(object):
    • class Loader(object):
    • class JavaLoader(Loader):
    • java_class = ".".join([java_package, cls.__name__])

@AmplabJenkins

Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28088/
Test PASSed.

@jkbradley

Copy link
Copy Markdown
Member

LGTM. I ran into a bug running the example, but it seems to be coming from elsewhere. It happens when calling train---and not all the time, only sometimes:

java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
    at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
    at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
    at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
    at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
    at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for ResultTask(279, 4)
java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
    at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
    at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
    at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
    at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
    at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

I'll make a separate JIRA for it.

@jkbradley

Copy link
Copy Markdown
Member

Made JIRA: [https://issues.apache.org/jira/browse/SPARK-6071]

@SparkQA

SparkQA commented Mar 1, 2015

Copy link
Copy Markdown

Test build #28151 has started for PR 4811 at commit f135dac.

  • This patch merges cleanly.

@mengxr mengxr changed the title [SPARK-5991][MLLIB] support save/load in PySpark's ALS [SPARK-6053][MLLIB] support save/load in PySpark's ALS Mar 1, 2015
@SparkQA

SparkQA commented Mar 1, 2015

Copy link
Copy Markdown

Test build #28151 has finished for PR 4811 at commit f135dac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MatrixFactorizationModel(JavaModelWrapper, Saveable, JavaLoader):
    • class Saveable(object):
    • class Loader(object):
    • class JavaLoader(Loader):
    • java_class = ".".join([java_package, cls.__name__])

@AmplabJenkins

Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28151/
Test PASSed.

@jkbradley

Copy link
Copy Markdown
Member

LGTM

asfgit pushed a commit that referenced this pull request Mar 2, 2015
A simple wrapper to save/load `MatrixFactorizationModel` in Python. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #4811 from mengxr/SPARK-5991 and squashes the following commits:

f135dac [Xiangrui Meng] update save doc
57e5200 [Xiangrui Meng] address comments
06140a4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5991
282ec8d [Xiangrui Meng] support save/load in PySpark's ALS

(cherry picked from commit aedbbaa)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@mengxr

mengxr commented Mar 2, 2015

Copy link
Copy Markdown
Contributor Author

Merged into master and branch-1.3. Thanks!

@asfgit asfgit closed this in aedbbaa Mar 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants