[SPARK-2550][MLLIB][APACHE SPARK] Support regularization and intercept in pyspark's linear methods.#1624
[SPARK-2550][MLLIB][APACHE SPARK] Support regularization and intercept in pyspark's linear methods.#1624miccagiann wants to merge 10 commits into
Conversation
…regression method.
|
Can one of the admins verify this patch? |
|
Jenkins, test this please. |
There was a problem hiding this comment.
Instead of adding new methods, we can add optional parameters to the original train method. For example, regType and regParam. User can set regType to l1, l2, or none (default).
There was a problem hiding this comment.
Ok! I am working in this issue!
|
QA tests have started for PR 1624. This patch merges cleanly. |
|
QA results for PR 1624: |
…in only one function.
There was a problem hiding this comment.
I used a type of Enumeration in order to separate between the different types of Update Methods [Regularizers] with which the user wants to create the model from training data. I tried to extend this object from Enumeration but from what I have seen it uses reflection heavily and it does not work well with serialized objects and with py4j...
There was a problem hiding this comment.
Using strings with a clear doc should be sufficient. Then you can map the string to L1Updater or SquaredUpdater inside PythonMLLibAPI.
There was a problem hiding this comment.
Ok! I will do it with strings both in python and in scala.
|
Jenkins, test this please. |
…ues of 'regType' parameter.
There was a problem hiding this comment.
Not using enumerations for regType parameter anymore. Switched to string values.
There was a problem hiding this comment.
@miccagiann It may be easier if you send the string directly to PythonMLLibAPI().trainLinearRegressionModelWithSGD and implement the logic there.
There was a problem hiding this comment.
In the current version, all branches in the if-else block are essentially the same.
There was a problem hiding this comment.
Yes! I fixed it in the regression.py file where I was calling the same function again and again. As far as PythonMLLibAPI().trainLinearRegressionModelWithSGD I implement there the logic as well... I am building right now and I will commit instantly.
|
@miccagiann For |
There was a problem hiding this comment.
We usually put . at the beginning of the line:
lrAlg.optimizer
.setNumIterations(numIterations)
.setRegParam(regParam)
.setStepSize(stepSize)
|
Not at all! I am going to change them! Thanks! |
There was a problem hiding this comment.
In Python, the line width for docs should be less than 80 (or 78 to be safe).
|
Btw, you can use |
|
I have applied the suggested changes! Please notify me if any more modifications should be performed. Thanks for all your help Xiangrui. |
There was a problem hiding this comment.
It is safer to add
else if (regType != "none")
throw IllegalArgumentException("...")
There was a problem hiding this comment.
By adding the exception to the scala code, I am going to remove the ValueError exception used in the python code.
|
Jenkins, add to whitelist. |
|
Jenkins, test this please. |
|
QA tests have started for PR 1624. This patch DID NOT merge cleanly! |
|
Xiangrui, After the tests are finished, should I merge my local branch with the upstream/master so as to make this patch merging smoothly? |
|
Yes, you need to merge the latest master and resolve conflicts first. |
|
I have done it. Thanks for all your help! Now, I suppose that I need to call Jenkins again, right? |
|
QA tests have started for PR 1624. This patch merges cleanly. |
|
LGTM. Waiting for Jenkins .... |
|
I added you to the whitelist. Jenkins should be triggered automatically for changes from you. |
|
Nice! Thanks for everything! Tomorrow I am going to search for new issues On Fri, Aug 1, 2014 at 10:55 PM, Xiangrui Meng notifications@github.com
|
|
Great! Do you mind adding regularization type and intercept to other linear methods? For example, |
|
Yes! I can do this. Is there an issue created in JIRA or it would be part of the same PR? |
|
It should be part of the same JIRA. But let's do that in a separate PR. |
|
OK! |
|
QA results for PR 1624: |
|
QA results for PR 1624: |
|
Merged into master. Thanks! |
|
Alright, I was fixing my branches so as my new commits to be included correctly in the new PR I am going to create. |
|
Xiangrui, I see that the JIRA issue is closed. Should we create a new one for the |
|
I re-opened the JIRA. Please use the same JIRA number for your new PR. Thanks! |
…t in pyspark's linear methods. Related to issue: [SPARK-2550](https://issues.apache.org/jira/browse/SPARK-2550?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20priority%20%3D%20Major%20ORDER%20BY%20key%20DESC). Author: Michael Giannakopoulos <miccagiann@gmail.com> Closes apache#1624 from miccagiann/new-branch and squashes the following commits: c02e5f5 [Michael Giannakopoulos] Merge cleanly with upstream/master. 8dcb888 [Michael Giannakopoulos] Putting the if/else if statements in brackets. fed8eaa [Michael Giannakopoulos] Adding a space in the message related to the IllegalArgumentException. 44e6ff0 [Michael Giannakopoulos] Adding a blank line before python class LinearRegressionWithSGD. 8eba9c5 [Michael Giannakopoulos] Change function signatures. Exception is thrown from the scala component and not from the python one. 638be47 [Michael Giannakopoulos] Modified code to comply with code standards. ec50ee9 [Michael Giannakopoulos] Shorten the if-elif-else statement in regression.py file b962744 [Michael Giannakopoulos] Replaced the enum classes, with strings-keywords for defining the values of 'regType' parameter. 78853ec [Michael Giannakopoulos] Providing intercept and regualizer functionallity for linear methods in only one function. 3ac8874 [Michael Giannakopoulos] Added support for regularizer and intercection parameters for linear regression method.
Upgrade callhomeservice to 0.2.20 Co-authored-by: Ling Yuan <lingyun_yuan@apple.com>
Related to issue: SPARK-2550.