Skip to content

Suggest replacing np.ma.array with boolean indexing for faster summary stats when mask is one-off #1

@SaFE-APIOpt

Description

@SaFE-APIOpt

mean_val = np.mean(masked_array)

Hi 👋 I came across this section of the code:

masked_array = np.ma.array(self.original_array, mask=~mask)
if masked_array.count() > 0:
    mean_val = np.mean(masked_array)
    min_val = np.min(masked_array)
    max_val = np.max(masked_array)

Suggested replacement:

valid_values = self.original_array[mask]
if valid_values.size > 0:
    mean_val = np.mean(valid_values)
    min_val = np.min(valid_values)
    max_val = np.max(valid_values)

np.ma.array(...) creates a MaskedArray object with additional overhead, designed for complex masking pipelines;

When you just need one-time basic statistics, direct indexing (e.g. array[mask]) is 1.5–3× faster and more memory efficient;

valid_values.size > 0 is functionally equivalent to masked_array.count() > 0;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions