⚡ Optimize data processing loops in Fisher LDA#26
Conversation
Consolidated multiple .reduce() calls into single-pass loops using the one-pass variance formula. This reduces iterations over the dataset from 5 to 1 per class for class statistics and from 2 to 1 for projection statistics. Benchmark results (200,000 iterations): Baseline: 244.61ms Optimized: 107.45ms (~2.3x faster) Co-authored-by: Raman369AI <110351568+Raman369AI@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request refactors the statistical calculations in FisherLDA.tsx by introducing getStats and get1DStats helper functions, replacing several inline reduce operations with more efficient single-loop implementations. The review feedback recommends adding guards for empty arrays in both functions to prevent potential division-by-zero errors and suggests simplifying the variance and standard deviation calculations by reusing the computed means.
| function getStats(pts: Pt[]) { | ||
| let sx = 0, sy = 0, sxx = 0, syy = 0, sxy = 0; | ||
| const n = pts.length; | ||
| for (let i = 0; i < n; i++) { | ||
| const p = pts[i]; | ||
| sx += p.x; sy += p.y; | ||
| sxx += p.x * p.x; syy += p.y * p.y; sxy += p.x * p.y; | ||
| } | ||
| const mx = sx / n, my = sy / n; | ||
| return { | ||
| m: { x: mx, y: my }, | ||
| sxx: (sxx - (sx * sx) / n) / n, | ||
| syy: (syy - (sy * sy) / n) / n, | ||
| sxy: (sxy - (sx * sy) / n) / n, | ||
| }; | ||
| } |
There was a problem hiding this comment.
The getStats function lacks a guard for empty arrays, which would lead to division by zero and NaN values. Additionally, the variance and covariance calculations can be simplified for better readability by reusing the computed means.
| function getStats(pts: Pt[]) { | |
| let sx = 0, sy = 0, sxx = 0, syy = 0, sxy = 0; | |
| const n = pts.length; | |
| for (let i = 0; i < n; i++) { | |
| const p = pts[i]; | |
| sx += p.x; sy += p.y; | |
| sxx += p.x * p.x; syy += p.y * p.y; sxy += p.x * p.y; | |
| } | |
| const mx = sx / n, my = sy / n; | |
| return { | |
| m: { x: mx, y: my }, | |
| sxx: (sxx - (sx * sx) / n) / n, | |
| syy: (syy - (sy * sy) / n) / n, | |
| sxy: (sxy - (sx * sy) / n) / n, | |
| }; | |
| } | |
| function getStats(pts: Pt[]) { | |
| const n = pts.length; | |
| if (n === 0) return { m: { x: 0, y: 0 }, sxx: 0, syy: 0, sxy: 0 }; | |
| let sx = 0, sy = 0, sxx = 0, syy = 0, sxy = 0; | |
| for (let i = 0; i < n; i++) { | |
| const p = pts[i]; | |
| sx += p.x; sy += p.y; | |
| sxx += p.x * p.x; syy += p.y * p.y; sxy += p.x * p.y; | |
| } | |
| const mx = sx / n, my = sy / n; | |
| return { | |
| m: { x: mx, y: my }, | |
| sxx: sxx / n - mx * mx, | |
| syy: syy / n - my * my, | |
| sxy: sxy / n - mx * my, | |
| }; | |
| } |
| function get1DStats(vals: number[]) { | ||
| let s = 0, ss = 0; | ||
| const n = vals.length; | ||
| for (let i = 0; i < n; i++) { | ||
| const v = vals[i]; | ||
| s += v; ss += v * v; | ||
| } | ||
| const mu = s / n; | ||
| const sig = Math.sqrt(Math.max(0, (ss - (s * s) / n) / n)); | ||
| return { mu, sig }; | ||
| } |
There was a problem hiding this comment.
The get1DStats function should include a guard for empty arrays to prevent division by zero. The standard deviation calculation can also be slightly simplified by using the already computed mean.
| function get1DStats(vals: number[]) { | |
| let s = 0, ss = 0; | |
| const n = vals.length; | |
| for (let i = 0; i < n; i++) { | |
| const v = vals[i]; | |
| s += v; ss += v * v; | |
| } | |
| const mu = s / n; | |
| const sig = Math.sqrt(Math.max(0, (ss - (s * s) / n) / n)); | |
| return { mu, sig }; | |
| } | |
| function get1DStats(vals: number[]) { | |
| const n = vals.length; | |
| if (n === 0) return { mu: 0, sig: 0 }; | |
| let s = 0, ss = 0; | |
| for (let i = 0; i < n; i++) { | |
| const v = vals[i]; | |
| s += v; ss += v * v; | |
| } | |
| const mu = s / n; | |
| const sig = Math.sqrt(Math.max(0, ss / n - mu * mu)); | |
| return { mu, sig }; | |
| } |
💡 What: Consolidated multiple
.reduce()calls into single-pass loops for computing class statistics (means and within-class scatter) and projection statistics (means and standard deviations).🎯 Why: The original code iterated over the same datasets up to 5 times per class, causing unnecessary CPU overhead and potential GC pressure from redundant intermediate calculations.
📊 Measured Improvement: Achieved a ~2.3x performance boost in data processing logic (Baseline: 244.61ms -> Optimized: 107.45ms for 200,000 iterations).
PR created automatically by Jules for task 17372230985130963295 started by @Raman369AI