Skip to content

Rename allunique to anyduplicated#16152

Closed
tkelman wants to merge 2 commits intomasterfrom
tk/hasduplicates
Closed

Rename allunique to anyduplicated#16152
tkelman wants to merge 2 commits intomasterfrom
tk/hasduplicates

Conversation

@tkelman
Copy link
Copy Markdown
Contributor

@tkelman tkelman commented May 1, 2016

ref #15803 and #15914 - allunique seems to me like it implies composition all(unique(x)) which isn't what this does, the anyduplicated name has precedent in R (where it is doing composition, we don't have a duplicated function in Julia right now but we conceivably could).

(I don't think this has been on master under its current name for long enough to warrant a deprecation)

tkelman added 2 commits May 1, 2016 04:14
this isn't directly related to all and isn't a composition,
especially for collections of collections
R also has a `duplicated` function where `anyduplicated` is
equivalent to the composition `any(duplicated(x))`
@DNF2
Copy link
Copy Markdown

DNF2 commented May 1, 2016

I also tried to point out that there is a further issue with #15914:

>> r = linspace(1.0, nextfloat(1.0), 10)
>> allunique(r)
true
>> allunique(collect(r))
false

Is this not a problem?

@tkelman
Copy link
Copy Markdown
Contributor Author

tkelman commented May 1, 2016

That might be an issue. Any suggestions for how to fix it? That doesn't really have a bearing on what to name it though.

@DNF2
Copy link
Copy Markdown

DNF2 commented May 1, 2016

You're right about that last point, but I see you did some code changes too, I thought I might bring it up.

I don't know if this is iron clad, since I haven't thought of many tests, but:

function hasduplicates{T<:AbstractFloat}(r::Range{T})
    if step(r) == 0
        return length(r) > 1
    elseif step(r) > 0
        return last(r) < nextfloat(first(r), length(r) - 1)
    else
        return last(r) > nextfloat(first(r), -length(r) + 1)
    end
end

I'm not sure what happens when you have weird steps like for

>> r = linspace(1,2,0)
linspace(1.0,2.0,0)
>> step(r)
NaN

Furthermore, I wonder whether it shouldn't be
hasduplicates{T}(r::Range{T}) = (step(r) == zero(T)) && (length(r) > 1)
instead of length(r) > one(T) since length always return an Int, right?

@DNF2
Copy link
Copy Markdown

DNF2 commented May 1, 2016

Hmm, maybe this should cover it for floats?

function hasduplicates{T<:AbstractFloat}(r::Range{T})
    length(r) <= 1 && return false
    step(r) == 0 && return true
    r_ = nextfloat(first(r), flipsign(length(r)-1, step(r)))
    return flipsign(last(r) - r_, step(r)) < 0
end

@DNF2
Copy link
Copy Markdown

DNF2 commented May 1, 2016

Tests should include linspaces, and could be something along these lines:

!hasduplicates(linspace(1.0, nextfloat(1.0), 2))
hasduplicates(linspace(1.0, nextfloat(1.0, 1), 3))
!hasduplicates(linspace(1.0, nextfloat(1.0, -10), 11))
hasduplicates(linspace(1.0, nextfloat(1.0, -10), 12))
!hasduplicates(linspace(1.0, 1.0, 0))

The last one has step() = NaN, which might conceivably trip up some implementations.

@DNF2
Copy link
Copy Markdown

DNF2 commented May 1, 2016

Here's another one that fails with the current code, but works with the one I suggested:
hasduplicates(range(2.0, eps(1.0), 4))

@DNF2
Copy link
Copy Markdown

DNF2 commented May 1, 2016

Hmmm. This one fails, though:
hasduplicates(range(prevfloat(typemax(Float64)), eps(1.0), 10))

There are some pretty hairy edge cases with LinSpace and FloatRange. It might require some deeper digging into their definitions.

@nalimilan
Copy link
Copy Markdown
Member

Why not, if that would be a more efficient version of a (possible) any(duplicated(x)).

@okvs I think you should open a separate issue, that's a complex enough problem.

@DNF2
Copy link
Copy Markdown

DNF2 commented May 2, 2016

It seemed like a pretty easy fix, when the code was being changed anyway. But now it looks like a rabbit hole, I think it even touches on some problems with linspace and range.

@tkelman tkelman mentioned this pull request May 3, 2016
@ararslan
Copy link
Copy Markdown
Member

ararslan commented May 4, 2016

With this proposal, to determine whether a collection consists entirely of unique elements you need !anyduplicated, whereas currently you use allunique, so this isn't really a renaming so much as swapping one function for another, right?

Personally I prefer allunique; more often I think, "does this have entirely unique elements?" rather than "are any of the elements duplicated?". IMO. ¯_(ツ)_/¯

@tkelman
Copy link
Copy Markdown
Contributor Author

tkelman commented May 6, 2016

Yes this would be flipping the sense in addition to the different name. Sounds like there are 3 against, 2 or maybe 3 for this. I dislike having that "all" in the name, but not enough people spoke up here with a straight +1 or -1 for it to be conclusive one way or the other.

@tkelman tkelman closed this May 27, 2016
@tkelman tkelman deleted the tk/hasduplicates branch May 27, 2016 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants