Add vectorized "in" (.∈) and "notin" (.∉)#12406
Add vectorized "in" (.∈) and "notin" (.∉)#12406alyst wants to merge 1 commit intoJuliaLang:masterfrom
Conversation
|
When you benchmark it is better to put code in a function. It avoids spurious results from evaluations in global scope. Compare: x = collect(1:1000);
@time for i in 1:1000 Bool[(xx ∈ 1:100) for xx in x] end
# 0.306228 seconds (5.96 M allocations: 152.985 MB, 6.32% gc time)function f()
x = collect(1:1000)
for i in 1:1000 Bool[(xx ∈ 1:100) for xx in x] end
end
@time f()
# 0.002487 seconds (1.01 k allocations: 1.045 MB) |
|
@KristofferC Wow, that made a huge difference! function bench_comprehension(n::Int)
x = collect(1:1000);
for i in 1:n Bool[(xx ∈ 1:100) for xx in x] end
end
@time bench_comprehension(10^6)
# 2.858 seconds (1000 k allocations: 1038 MB, 7.97% gc time)function bench_functor(n::Int)
x = collect(1:1000);
for i in 1:n x .∈ 1:100 end
end
@time bench_functor(10^6)
# 3.460 seconds (1000 k allocations: 1038 MB, 6.39% gc time) |
There was a problem hiding this comment.
These should produce BitArrays; this should also do broadcasting the way other vectorized ops do.
|
@StefanKarpinski It definitely makes sense, but I need some guidance to make it right. Unlike the common broadcasting case (e.g. Ideally, one would like to have Actually, even the proposed PR cannot discriminate "vector of elements vs single set" and "set vs single collection of sets" cases. |
|
I'm not a big fan of this. It's a bit confusing to vectorize operations that are already collection operations, and I'd rather not encourage more special-case vector notation (#8450). |
|
@JeffBezanson If #8450 would lead to some new syntax for an easy vectorization, it would be fantastic. |
|
So the reaction to |
|
See old issue filed at #5212. |
Add vectorized "in" (
.∈) and "not in" (.∉) operators to be on par with R.This is quite handy in combination with
DataFramesfor dataset filtering.