Skip to content

Incorrect point filtering logic in load_valid_labels() #156

@ry-immr

Description

@ry-immr

Describe the bug

The load_valid_labels() function in yolo/tools/data_loader.py currently filters coordinates element-wise, as shown below:

valid_points = points[(points >= 0) & (points <= 1)].reshape(-1, 2)

Because some rows can be partially filtered out, this can disrupt the 2D structure of the coordinate pairs.

A more appropriate approach would be:

valid_points = points[np.all((points >= 0) & (points <= 1), axis=1)]

This ensures that each pair of coordinates (rows) is checked together, filtering only those points that satisfy the conditions (coord >= 0 and coord <= 1) across both axes.

To Reproduce

Consider the following points:

points = np.array([[0.1, 0.1], [1.1, 0.1], [1.1, 0.8], [0.1, 0.8]])

With the current logic:

>>> (points >= 0) & (points <= 1)
array([[ True,  True],
       [False,  True],
       [False,  True],
       [ True,  True]])
>>> points[(points >= 0) & (points <= 1)].reshape(-1, 2)
array([[0.1, 0.1],
       [0.1, 0.8],
       [0.1, 0.8]])

With the suggested fix, the output would be:

>>> np.all((points >= 0) & (points <= 1), axis=1)
array([ True, False, False,  True])
>>> points[np.all((points >= 0) & (points <= 1), axis=1)]
array([[0.1, 0.1],
       [0.1, 0.8]])

This fix preserves each pair of coordinates, but if a bounding box has points outside [0, 1], those points are filtered out. This can lead to incomplete bounding boxes composed only of the remaining points, which no longer meaningfully represent the original bounding box.

Expected behavior

I believe that unless all points lie entirely outside the [0, 1], we should preserve the bounding box by clipping any out-of-range coordinates to [0, 1]. This helps maintain as much of the original bounding box as possible. What are your thoughts?

Screenshots

Visualization bboxes of heads using the CrowdHuman dataset.

Current filtering logic:
Image

Fixed filtering logic:
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions