Remove unnecessary assert blocking direct usage of CSC for LSI#1622
Remove unnecessary assert blocking direct usage of CSC for LSI#1622menshikh-iv merged 1 commit intopiskvorky:developfrom
Conversation
|
@isamaru IIRC, the code path that handles CSC was there for the distributed version only. The master worker prepares batches for workers as CSC matrices; the workers process the batches and send the results back to master over the network. The code was never meant to be used in this way, as the "main" worker, it's a different use case. I actually wouldn't expect it to work so easily (do all the necessary model setup etc, as if it was the master itself). But if it does, that's great. |
menshikh-iv
left a comment
There was a problem hiding this comment.
Please check that distributed onepass works properly, look at this tutorial: https://radimrehurek.com/gensim/dist_lsi.html
|
@radim Thanks for the explanation! Makes much more sense now. Distributed algorithm with not work on CSC directly as written now: the dispatcher code is only enabled in the top branch ( To sum up: serial (non-distributed) algorithm will work directly on CSC. Distributed will not work, failing on the very same assert than before ('must be in serial mode to receive jobs'). I could rewrite that assert message to be more meaningful to the user: "Distributed mode not supported for CSC matrices directly. Use csc2corpus." |
|
I see, thanks. I think that's fine as-is, no changes needed -- nobody really uses the (ancient) distributed mode, AFAIK. It's one of the candidate features for removal. |
|
@piskvorky ok, I close this PR |
|
Why? We definitely want this PR. |
|
@piskvorky as I understand, |
|
No, it means we don't need to rewrite the assert message. |
|
Thanks @isamaru 🥇 |
@piskvorky @janpom @menshikh-iv
Removes the assert which is blocking usage of CSC directly as corpus for LsiModel with multiple power iterations.
I could not figure out why the assert was there in the first place: it doesn't make sense to me: The code path handles CSC exclusively but I don't see why CSC would require a single pass only. It works just fine with multiple power iterations.