Skip to content

Efficiencies#3

Open
nasiryahm wants to merge 30 commits intomainfrom
efficiencies
Open

Efficiencies#3
nasiryahm wants to merge 30 commits intomainfrom
efficiencies

Conversation

@nasiryahm
Copy link
Copy Markdown
Collaborator

Summary of the changes in this pull request:

  • Performance changes: vectorized core computations, replaced loop-based batched_pdist with NumPy triu_indices vectorization, vectorized B matrix construction ++

  • Parallelization : unified parallelism on joblib, replaced raw multiprocess / shared_memory usage in lpca, kpca, best, and post-processing with joblib.Parallel, eliminating shared-memory boilerplate. Also parallelized post-processing and refactored _postprocess into a standalone _postprocess_worker with batched evaluation

  • Dynamic memory allocation: added _get_available_memory() to auto-detect available RAM (via psutil or PYRATS_MEMORY_LIMIT env var) and batch work accordingly, preventing OOM crashes on large datasets

  • sklearn-style API rename: renamed parameters to follow sklearn conventions (d→n_components, k→n_neighbors, eta_min→min_cluster_size, max_iter→n_iter, kpca_kernel→kernel, etc.) with conversions for backwards compatibility

  • Progress bars: added tqdm progress bars (via verbose flag) to PCA, KPCA, intermediate views, and refinement loops; removed raw print statements

  • CI benchmark workflow: added .github/workflows/benchmark.yml and tests/scripts/benchmark.py for speed testing across platforms

  • Cleanup: removed duplicate imports, unused procrustes code, and multiprocess dependency; updated pyproject.toml to add tqdm; refreshed example notebooks and README

…es using smart triu and neighbourhood graph checks
… of row and col via nonzero and dist measure. Plus better csc usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant