Skip to content

Incorrect frequencies when nil in first column #1135

@jrrrp

Description

@jrrrp

When finding the frequencies of a dataframe I seem to get incorrect counts when the first column in the grouping has a nil. To demonstrate:

  1. Working example and no nils
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, 2])
Explorer.DataFrame.frequencies(df, [:a, :b], stable: true)

As expected:

#Explorer.DataFrame<
  Polars[2 x 3]
  a string ["a", "b"]
  b s64 [1, 2]
  counts u32 [2, 1]
>
  1. Still working: switching the 2 to a nil but keeping the order the same:
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
Explorer.DataFrame.frequencies(df, [:a, :b], stable: true)

As expected:

#Explorer.DataFrame<
  Polars[2 x 3]
  a string ["a", "b"]
  b s64 [1, nil]
  counts u32 [2, 1]
>
  1. Different than expected: switching the order of the columns so that the column with nil is first:
df = Explorer.DataFrame.new(a: ["a", "a", "b"], b: [1, 1, nil])
Explorer.DataFrame.frequencies(df, [:b, :a], stable: true)
#Explorer.DataFrame<
  Polars[2 x 3]
  b s64 [1, nil]
  a string ["a", "b"]
  counts u32 [2, 0]
>

There are 2 counts of the first combination as before, but now there are none for the second ({"b", nil}), though it is only the ordering of the columns that changed.

First time reporting a bug - extra info if needed:

Explorer 0.11.1
Elixir 1.19.3
OTP release 28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions