Pulse is an anomaly aware downsampling algorithm. While most downsampling algorithms smooth over features in data, Pulse captures and preserves them.
Given two arrays, x and y, Pulse downsamples the dataset while preserving any anomalies residing in y.
from pulse import Pulse
pulse = new Pulse()
downsampled_x, downsampled_y = pulse.downsample(x, y, n)
Downsamples uniform data, retaining sparse data like outliers or asymptotes
Args:
x (list): A list of x values
y (list): A list of y values
n (int): The desired quantity of downsampled records
threshold (float): **default 90** The desired threshold for changes in data to be considered outliers, as a percentile
proportion (float): **default 5** Ratio of uniform records to outliers that should be sampled and returned
Returns:
tuple: A tuple containing two numpy arrays
- x (NDArray): A downsampled array of x values
- y (NDArray): A downsampled array of y values
The below example has 500,000 records, 10 of which are said to have importance. Pulse identifies these outliers and proceeds to evenly downsample the remaining dataset.
