Make uniqued() lazy by default#71
Conversation
There was a problem hiding this comment.
This all looks good to me.
My only question is how much folks are attached to uniqued() returning an Array. I think that we probably want to take uniqued() in this lazy direction (as opposed to, say, adding lazy.uniqued()), because it's more consistent with the rest of Algorithms and stdlib, but whenever I see this in the wild it's returning an Array.
|
|
||
| The `uniqued` methods are O(_n_) in both time and space complexity. | ||
| The eager `uniqued(on:)` method is O(_n_) in both time and space complexity. | ||
| The lazy versions are O(_1_). |
There was a problem hiding this comment.
I think it's a very salient consideration mentioned by @kylemacomber that most uses seem to expect an Array, in which case a .lazy.uniqued design would make more sense.
Additionally, this discrepancy here, where uniqued(on:) isn't lazy by default but uniqued would be, seems like it invites confusion. I certainly would not expect that supplying a custom predicate would change the complexity or behavior so fundamentally, and I can see that issue arising when people make changes to their code and encounter this difference. Therefore, if it makes sense to make uniqued lazy, then I think the same change should be applied to uniqued(on:).
There was a problem hiding this comment.
Do people expect uniqued() to return an array, or do implementations usually simply return an array because it's easier to implement that way? I genuinely don't know the answer to this.
I share your concern about the discrepancy between uniqued() and uniqued(on:), but I do think that viewing them in isolation, this change is consistent with other sequence operations in the standard library. Operations that can be lazy without having to compromise (except perhaps on the return type) typically are, even when called on a sequence not conforming to LazySequenceProtocol. Collection's joined() and reversed() fall into this category, and I argue that uniqued() does as well. uniqued(on:) does not, because making it lazy would require the closure to be escaping and non-throwing.
There was a problem hiding this comment.
Do any other algorithms in this library or the standard library differ in laziness depending on the presence of a custom predicate?
There was a problem hiding this comment.
I believe the pair of operations that comes closest is joined() being lazy and flatMap { $0 } being eager while effectively doing the same thing. Of course they don't have similar names like uniqued() and uniqued(on:) do.
This is a fairly unique situation because other potential pairs lack one of the variants. chunked(by: ==) and compactMap { $0 } have no corresponding closure-less version, and operations like zip, reversed, and combinations have no versions that do take a closure.
There was a problem hiding this comment.
Ya I think the joined()/flatMap { $0 } and compacted()/compactMap { $0 } (see #112) serve as good precedent of similar of lazy/eager pairs.
To try to articulate the philosophy:
- Anything that can efficiently be lazy should be lazy, because (i) it can avoid extra work (e.g. unnecessary computation or allocation) and (ii) it's easy to go from lazy to eager by constructing an Array, but it's impossible to go the other way.
- Require an explicit
.lazyfor algorithms that take a closure to emphasize that (i) the closure will not run immediately and (ii) may run more than once per element in the collection, which can introduce a surprising vector for error. - An algorithm shouldn't be lazy if its being lazy would pessimize its runtime complexity. For example, a lazy
reversedadapter over a plainCollectionwould be absurd because looping over all of its indices would be O(n²).
There was a problem hiding this comment.
Now that we have OrderedSet via the Swift Collections packages, I think it's even clearer to me that uniqued should be lazy... the ability to do partial computation is really the only thing (other than method call syntax) distinguishing this algorithm from just creating an OrderedSet
There was a problem hiding this comment.
Now that we have
OrderedSetvia the Swift Collections packages, I think it's even clearer to me thatuniquedshould be lazy... the ability to do partial computation is really the only thing (other than method call syntax) distinguishing this algorithm from just creating anOrderedSet
+1
|
@swift-ci please test |
|
|
Adds the
Uniquedsequence that lazily produces the unique elements of a sequence.uniqued()andlazy.uniqued(on:)now return aUniquedinstance, the eageruniqued(on:)still produces an array.Checklist