Skip to content

Update to use NullableArrays and CategoricalArrays#145

Merged
randy3k merged 9 commits intomasterfrom
dataframes
Sep 29, 2016
Merged

Update to use NullableArrays and CategoricalArrays#145
randy3k merged 9 commits intomasterfrom
dataframes

Conversation

@randy3k
Copy link
Copy Markdown
Member

@randy3k randy3k commented Sep 26, 2016

TODO

  • Handle ordered factor
  • Optionally convert scalar NA to Nullable instead of NullableArray

Copy link
Copy Markdown
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

import DataStructures: OrderedDict

import Base: eltype, show, convert, isascii,
import Base: eltype, show, convert, isascii, isnull,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need import for this, merely using AFAICT.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is because RCall also has a isnull method.

@test rcopy(Nullable, RObject(1)).value == 1
@test rcopy(Nullable, RObject("abc")).value == "abc"
@test rcopy(RObject(Nullable(1))) == 1
@test rcopy(Nullable, RObject(Nullable(1, true))).isnull
Copy link
Copy Markdown
Member

@nalimilan nalimilan Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should really avoid accessing the isnull field directly, as it's likely to go away soon (JuliaLang/julia#18510). Use the accessor instead. (This also applies in other places.)

@test rcopy(DataArray,"c(NA,'NA')").na == @data([NA,"NA"]).na
@test_throws ErrorException rcopy(DataArray,"as.factor(c('a','a','c'))")
@test rcopy(PooledDataArray,"as.factor(c('a','a','c'))").pool == ["a","c"]
@test rcopy(NullableArray,"c(NA,TRUE)").isnull == NullableArray([true,true], [true,false]).isnull
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for NullableArray. You can use isequal to test for equality of both contents and missingness, which is even better.

Copy link
Copy Markdown
Member Author

@randy3k randy3k Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any method to compare the following withoout values?

isequal(NullableArray([1,2,3]), collect(1:3)) # true
isequal(NullableArray([1,2,3]).values, collect(1:3)) # false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you need to do NullableArray(1:3).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isequal(NullableArray(1:3), 1:3) or isequal(NullableArray(1:3), collect(1:3)) both return false..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant isequal(rcopy(NullableArray,"c(NA,TRUE)"), NullableArray([true,true], [true,false])and so on (i.e. convert both sides to aNullableArray`).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops.. i see

function sexp{T<:Compat.String,N,R<:Integer}(v::$typ{T,N,R})
rv = protect(sexp(v.refs))
try
for (i,isna) = enumerate(v.refs .== 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.refs .== 0 will allocate a vector. Better do == 0 inside the loop.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching it

rv[i] = naeltype(eltype(rv))
end
end
setattrib!(rv, Const.LevelsSymbol, sexp(v.pool.index))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use CategoricalArrays.index(v), ordered(v) and levels(v). Private fields should really not be accessed directly as they could change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CategoricalArrays.index does not work?

julia> v = CategoricalArray(repeat(["a", "b"], inner = 5));

julia> CategoricalArrays.levels(v)
2-element Array{String,1}:
 "a"
 "b"

julia> CategoricalArrays.index(v)
ERROR: MethodError: no method matching index(::CategoricalArrays.CategoricalArray{String,1,UInt32})
Closest candidates are:
  index(::CategoricalArrays.CategoricalPool{T,R<:Integer,V}) at /Users/Randy/.julia/v0.5/CategoricalArrays/src/pool.jl:161

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed, at first I didn't intend it to be used outside of the package. For now the best approach is CategoricalArrays.index(v.pool). I'll try to see if I can improve the API in next releases.

@randy3k randy3k merged commit 303e97b into master Sep 29, 2016
@randy3k randy3k deleted the dataframes branch September 29, 2016 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants