Skip to content

#386 LogisticRegression.predict_proba should return (n, 2) for binary#760

Merged
TomAugspurger merged 3 commits into
dask:mainfrom
rikturr:predict-proba-shape
Apr 17, 2021
Merged

#386 LogisticRegression.predict_proba should return (n, 2) for binary#760
TomAugspurger merged 3 commits into
dask:mainfrom
rikturr:predict-proba-shape

Conversation

@rikturr

@rikturr rikturr commented Nov 25, 2020

Copy link
Copy Markdown
Contributor

This PR addresses #386 by making sure LogisticRegression.predict_proba() returns an array of shape (n, 2) to match what scikit-learn does. There are some things currently hard-coded for binary classification only, but it is clear that multi-class is currently not supported so that can be changed when multi-class is implemented.

There is some risk here that if users have downstream code that depends on the current functionality (return shape of (n,)), the changes from this PR could break their code.

@rikturr

rikturr commented Nov 25, 2020

Copy link
Copy Markdown
Contributor Author

I'm not quite sure why some of the checks are failing. It seems one of them is due to:

##[warning]numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject

but I'm not quite sure what I can change to make the check pass?

@TomAugspurger

TomAugspurger commented Nov 26, 2020 via email

Copy link
Copy Markdown
Member

Base automatically changed from master to main February 2, 2021 03:43
@rikturr

rikturr commented Feb 3, 2021

Copy link
Copy Markdown
Contributor Author

Looks like test failures have been addressed, besides linting (thanks @jsignell 🎉 ). I think this should be ready to merge if the implementation looks good

@jsignell

jsignell commented Feb 5, 2021

Copy link
Copy Markdown
Member

Looks like test failures have been addressed

Might not be quite done yet, but I'm chipping away at it.

@jsignell

Copy link
Copy Markdown
Member

Seems like only the keras test is broken and that'll be fixed by #794, so good to merge?

@rikturr

rikturr commented Apr 15, 2021

Copy link
Copy Markdown
Contributor Author

Triggered the tests to run again, so I think all is good now! Can this be merged?

@TomAugspurger TomAugspurger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This has dragged on long enough (sorry) so I'm going to merge. But if you want to make a cleanup PR you could simplify things to just call np.vstack on the array. Thanks to NEP-18 it works on Dask Arrays too.

return X.assign(intercept=1)[["intercept"] + list(columns)]


@dispatch(np.ndarray) # noqa: F811

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can actually just use np.vstack now, since Dask implemented NEP 18.

@TomAugspurger TomAugspurger merged commit 090dd27 into dask:main Apr 17, 2021
@rikturr rikturr deleted the predict-proba-shape branch April 19, 2021 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants