Skip to content

⚡ Optimize data processing loops in Fisher LDA#26

Open
Raman369AI wants to merge 1 commit into
mainfrom
performance-fisher-lda-loops-optimization-17372230985130963295
Open

⚡ Optimize data processing loops in Fisher LDA#26
Raman369AI wants to merge 1 commit into
mainfrom
performance-fisher-lda-loops-optimization-17372230985130963295

Conversation

@Raman369AI

Copy link
Copy Markdown
Owner

💡 What: Consolidated multiple .reduce() calls into single-pass loops for computing class statistics (means and within-class scatter) and projection statistics (means and standard deviations).

🎯 Why: The original code iterated over the same datasets up to 5 times per class, causing unnecessary CPU overhead and potential GC pressure from redundant intermediate calculations.

📊 Measured Improvement: Achieved a ~2.3x performance boost in data processing logic (Baseline: 244.61ms -> Optimized: 107.45ms for 200,000 iterations).


PR created automatically by Jules for task 17372230985130963295 started by @Raman369AI

Consolidated multiple .reduce() calls into single-pass loops using the
one-pass variance formula. This reduces iterations over the dataset
from 5 to 1 per class for class statistics and from 2 to 1 for
projection statistics.

Benchmark results (200,000 iterations):
Baseline: 244.61ms
Optimized: 107.45ms (~2.3x faster)

Co-authored-by: Raman369AI <110351568+Raman369AI@users.noreply.github.com>
@google-labs-jules

Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the statistical calculations in FisherLDA.tsx by introducing getStats and get1DStats helper functions, replacing several inline reduce operations with more efficient single-loop implementations. The review feedback recommends adding guards for empty arrays in both functions to prevent potential division-by-zero errors and suggests simplifying the variance and standard deviation calculations by reusing the computed means.

Comment on lines +37 to +52
function getStats(pts: Pt[]) {
let sx = 0, sy = 0, sxx = 0, syy = 0, sxy = 0;
const n = pts.length;
for (let i = 0; i < n; i++) {
const p = pts[i];
sx += p.x; sy += p.y;
sxx += p.x * p.x; syy += p.y * p.y; sxy += p.x * p.y;
}
const mx = sx / n, my = sy / n;
return {
m: { x: mx, y: my },
sxx: (sxx - (sx * sx) / n) / n,
syy: (syy - (sy * sy) / n) / n,
sxy: (sxy - (sx * sy) / n) / n,
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The getStats function lacks a guard for empty arrays, which would lead to division by zero and NaN values. Additionally, the variance and covariance calculations can be simplified for better readability by reusing the computed means.

Suggested change
function getStats(pts: Pt[]) {
let sx = 0, sy = 0, sxx = 0, syy = 0, sxy = 0;
const n = pts.length;
for (let i = 0; i < n; i++) {
const p = pts[i];
sx += p.x; sy += p.y;
sxx += p.x * p.x; syy += p.y * p.y; sxy += p.x * p.y;
}
const mx = sx / n, my = sy / n;
return {
m: { x: mx, y: my },
sxx: (sxx - (sx * sx) / n) / n,
syy: (syy - (sy * sy) / n) / n,
sxy: (sxy - (sx * sy) / n) / n,
};
}
function getStats(pts: Pt[]) {
const n = pts.length;
if (n === 0) return { m: { x: 0, y: 0 }, sxx: 0, syy: 0, sxy: 0 };
let sx = 0, sy = 0, sxx = 0, syy = 0, sxy = 0;
for (let i = 0; i < n; i++) {
const p = pts[i];
sx += p.x; sy += p.y;
sxx += p.x * p.x; syy += p.y * p.y; sxy += p.x * p.y;
}
const mx = sx / n, my = sy / n;
return {
m: { x: mx, y: my },
sxx: sxx / n - mx * mx,
syy: syy / n - my * my,
sxy: sxy / n - mx * my,
};
}

Comment on lines +73 to +83
function get1DStats(vals: number[]) {
let s = 0, ss = 0;
const n = vals.length;
for (let i = 0; i < n; i++) {
const v = vals[i];
s += v; ss += v * v;
}
const mu = s / n;
const sig = Math.sqrt(Math.max(0, (ss - (s * s) / n) / n));
return { mu, sig };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The get1DStats function should include a guard for empty arrays to prevent division by zero. The standard deviation calculation can also be slightly simplified by using the already computed mean.

Suggested change
function get1DStats(vals: number[]) {
let s = 0, ss = 0;
const n = vals.length;
for (let i = 0; i < n; i++) {
const v = vals[i];
s += v; ss += v * v;
}
const mu = s / n;
const sig = Math.sqrt(Math.max(0, (ss - (s * s) / n) / n));
return { mu, sig };
}
function get1DStats(vals: number[]) {
const n = vals.length;
if (n === 0) return { mu: 0, sig: 0 };
let s = 0, ss = 0;
for (let i = 0; i < n; i++) {
const v = vals[i];
s += v; ss += v * v;
}
const mu = s / n;
const sig = Math.sqrt(Math.max(0, ss / n - mu * mu));
return { mu, sig };
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant