Skip to content

[C++] CSV string column category to dictionary/indices? #5927

Description

@ntfshard

Hello

I'm a newcomer and not quite sure about the library usage. I tried to find some documentation about it but failed.

I have a dataset in CSV file where one column(let's call it colour) is a string category. I'd like to get indices instead of text_lines to pass it inside algorithm.
I tried to set column_types in ConvertOptions in
{{"colour", arrow::dictionary(std::make_shared<arrow::Int32Type>(), arrow::utf8()) }} but it seems to be not right api usage, a wild run-time error appears: NotImplemented: CSV conversion to dictionary<values=string, indices=int32, ordered=0> is not supported
Also I find a merged PR #5785 but not quite sure that's applicable for my case.

So, my question is: can I get indices inside a category column only w/ library API. And if yes, what I doing wrong. :)

In other word, I'd like to something like such python pandas code:
df[column] = df[column].cat.codes # if str(column_data_type) == "category"

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions