Define length(::Flatten) for some collections of arrays#47353
Define length(::Flatten) for some collections of arrays#47353mcabbott wants to merge 1 commit intoJuliaLang:masterfrom
length(::Flatten) for some collections of arrays#47353Conversation
LilithHafner
left a comment
There was a problem hiding this comment.
For flattened tuples of AbstractArrays this seems quite sensible.
How is the performance for long vectors of short vectors?
| _flatten_iteratorsize(sz, ::HasEltype, ::Type{<:Tuple{Vararg{AbstractArray}}}) = HasLength() | ||
| _flatten_iteratorsize(sz, ::HasEltype, ::Type{<:AbstractArray{<:AbstractArray}}) = HasLength() |
There was a problem hiding this comment.
I don't see how HasEltype is relevant here, would this work too?
| _flatten_iteratorsize(sz, ::HasEltype, ::Type{<:Tuple{Vararg{AbstractArray}}}) = HasLength() | |
| _flatten_iteratorsize(sz, ::HasEltype, ::Type{<:AbstractArray{<:AbstractArray}}) = HasLength() | |
| _flatten_iteratorsize(sz, _, ::Type{<:Tuple{Vararg{AbstractArray}}}) = HasLength() | |
| _flatten_iteratorsize(sz, _, ::Type{<:AbstractArray{<:AbstractArray}}) = HasLength() |
|
For a large enough array of tiny arrays, julia> let vs = [rand(2) for _ in 1:10^6]
@btime sum(length, $vs)
@btime sum(first, $vs) # a cheap operation visiting all arrays
@btime collect(Iterators.flatten($vs))
end;
2.020 ms (0 allocations: 0 bytes) # length
2.259 ms (0 allocations: 0 bytes)
17.394 ms (15 allocations: 18.90 MiB) # flatten, master
6.333 ms (2 allocations: 15.26 MiB) # flatten, this PRStaticArrays provides a length based on type, in src/flatten.jl. In its present state this PR dispatches earlier & breaks that: julia> using StaticArrays
julia> let vs = [@SVector(rand(2)) for _ in 1:10^6]
@btime sum(length, $vs)
@btime sum(first, $vs)
@btime collect(Iterators.flatten($vs))
end;
5.535 μs (0 allocations: 0 bytes) # special length(flatten(vs)) takes 2.5ns
674.875 μs (0 allocations: 0 bytes)
2.688 ms (2 allocations: 15.26 MiB) # flatten, master
3.119 ms (2 allocations: 15.26 MiB) # flatten, this PR |
|
To fix the interaction with StaticArrays it would work to define the changes here at the Either way, weshould probably define |
|
Bump on this. From a user perspective, it seems really weird that length(::Flatten) doesn't work when flattening any iterators that have known length. I think it shouldn't be limited to arrays and tuples though, e.g. UnitRange and CartesianIndices should work as well. From my perspective, it's about semantics, not just efficiency for things like map. |
| length(f::Flatten{<:Tuple{Vararg{AbstractArray}}}) = sum(length, f.it, init=0) | ||
| length(f::Flatten{<:AbstractArray{<:AbstractArray}}) = sum(length, f.it, init=0) |
There was a problem hiding this comment.
I think this is not in general type stable when f.it is empty. Is there even a way to make it type stable?
There was a problem hiding this comment.
Because length(x) isa Int is not necessarily true.
There was a problem hiding this comment.
I'm willing to accept non-type-stable length(::Flatten) for inputs that have lengths that are neither Int nor smaller integers that promote to Int when added to 0.
length almost always returns Int.
There was a problem hiding this comment.
julia> length(big(1):big(2)) isa Int
falseUnitRange{BigInt} does not seem that exceptional, for what that is worth. Both UnitRange and BigInt are exported from Base and commonly used.
| length(f::Flatten{I}) where {I} = flatten_length(f, eltype(I)) | ||
| length(f::Flatten{Tuple{}}) = 0 | ||
| length(f::Flatten{<:Tuple{Vararg{AbstractArray}}}) = sum(length, f.it, init=0) | ||
| length(f::Flatten{<:AbstractArray{<:AbstractArray}}) = sum(length, f.it, init=0) |
There was a problem hiding this comment.
This method relies for correctness on the assumption that any subtype of AbstractArray is a stateless iterator. I am not completely sure if that assumption is justified. A stateful AbstractArray would certainly be uncommon and weird, but perhaps still valid?
There was a problem hiding this comment.
As suggested at #23431, this defines
length(andIteratorSizetrait) for a few moreIterators.flattenobjects: those flattening an arrays of array, or a tuples of arrays.Computing the total length involves adding the lengths of the constituent arrays. At present
length(::Flatten)never does this. For these cases it seems to be very cheap.Knowing the length speeds up
collect ∘ Iterators.flatten, although there is still room for improvement:Some other cases with already known length remain quite slow: