Conversation
…s unified. Removed duplicated imports.
…es using smart triu and neighbourhood graph checks
…a range of datapoint numbers
… of row and col via nonzero and dist measure. Plus better csc usage
…eads, for postprocessing
…asets without crashes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of the changes in this pull request:
Performance changes: vectorized core computations, replaced loop-based batched_pdist with NumPy triu_indices vectorization, vectorized B matrix construction ++
Parallelization : unified parallelism on joblib, replaced raw multiprocess / shared_memory usage in lpca, kpca, best, and post-processing with joblib.Parallel, eliminating shared-memory boilerplate. Also parallelized post-processing and refactored _postprocess into a standalone _postprocess_worker with batched evaluation
Dynamic memory allocation: added _get_available_memory() to auto-detect available RAM (via psutil or PYRATS_MEMORY_LIMIT env var) and batch work accordingly, preventing OOM crashes on large datasets
sklearn-style API rename: renamed parameters to follow sklearn conventions (d→n_components, k→n_neighbors, eta_min→min_cluster_size, max_iter→n_iter, kpca_kernel→kernel, etc.) with conversions for backwards compatibility
Progress bars: added tqdm progress bars (via verbose flag) to PCA, KPCA, intermediate views, and refinement loops; removed raw print statements
CI benchmark workflow: added .github/workflows/benchmark.yml and tests/scripts/benchmark.py for speed testing across platforms
Cleanup: removed duplicate imports, unused procrustes code, and multiprocess dependency; updated pyproject.toml to add tqdm; refreshed example notebooks and README