Efficiencies by nasiryahm · Pull Request #3 · Mishne-Lab/pyRATS

nasiryahm · 2026-04-10T10:18:18Z

Summary of the changes in this pull request:

Performance changes: vectorized core computations, replaced loop-based batched_pdist with NumPy triu_indices vectorization, vectorized B matrix construction ++
Parallelization : unified parallelism on joblib, replaced raw multiprocess / shared_memory usage in lpca, kpca, best, and post-processing with joblib.Parallel, eliminating shared-memory boilerplate. Also parallelized post-processing and refactored _postprocess into a standalone _postprocess_worker with batched evaluation
Dynamic memory allocation: added _get_available_memory() to auto-detect available RAM (via psutil or PYRATS_MEMORY_LIMIT env var) and batch work accordingly, preventing OOM crashes on large datasets
sklearn-style API rename: renamed parameters to follow sklearn conventions (d→n_components, k→n_neighbors, eta_min→min_cluster_size, max_iter→n_iter, kpca_kernel→kernel, etc.) with conversions for backwards compatibility
Progress bars: added tqdm progress bars (via verbose flag) to PCA, KPCA, intermediate views, and refinement loops; removed raw print statements
CI benchmark workflow: added .github/workflows/benchmark.yml and tests/scripts/benchmark.py for speed testing across platforms
Cleanup: removed duplicate imports, unused procrustes code, and multiprocess dependency; updated pyproject.toml to add tqdm; refreshed example notebooks and README

…s unified. Removed duplicated imports.

…domization issue

…es using smart triu and neighbourhood graph checks

…crustes code.

…a range of datapoint numbers

…ersions

… of row and col via nonzero and dist measure. Plus better csc usage

… test

… commits

…parisons

…eads, for postprocessing

…asets without crashes

nasiryahm added 30 commits March 24, 2026 16:38

fix: Parallelization broken in lpca and different parallelism backend…

dda0543

…s unified. Removed duplicated imports.

Tests: Update to CI for new parallel library changes plus fix for ran…

a4e1851

…domization issue

speedup: Parallelization of post-processing neighbour distance measur…

6aac4b1

…es using smart triu and neighbourhood graph checks

speedup: Vectorizing pdist

8035701

fix: Improving error handling and variable descriptions

01ab3c8

style: Code cleanups for neatness and readability. Removed unused pro…

0e2313b

…crustes code.

CI: New continuous integration testing for speed. Tests speed across …

e425db7

…a range of datapoint numbers

speedup: Dhruv's eigh implementation

cc84dd7

fix: Imports fix

40db85a

speedup: Vectorizing B construction and avoiding sparse to dense conv…

4641346

…ersions

speedup: Speedup of zeta function by replacing pdist with measurement…

26528b3

… of row and col via nonzero and dist measure. Plus better csc usage

Merging zeta change

5bdc3d7

tests: Updating benchmarking tests to include windows, mac, and linux…

cb6fcd1

… test

Test: Adding a test benchmark script to give timings across different…

a1a2806

… commits

Fix: Issue with sha code

ef23418

Matrixifying refs for benchmarks and removing Dhruv's

bfca429

Efficiency: Memefficiency via Joshua code and and updated sha for com…

274f841

…parisons

Fix: new mode of batch all and importing

1254199

Fix: Past git repo commit needed init

a82f827

Update: Cleaner method for src reconstruction for commit comparisons

d1f3123

Test: Alternative memory efficient, with parallelization over num thr…

81018d5

…eads, for postprocessing

Style: Updating parameter names to match sklearn

3325634

Fix: Test var names

d7e10d6

style: Naming change of buml_obj to model

2d7695b

feat: New method for dynamic allocation of memory to enable large dat…

d7fcb10

…asets without crashes

Fix: Dynamic memory allocation fix

d8f85a1

style: progress bar update and verbose cleanup

87b2e81

Update to readme and pyproject (tqdm addition)

1648a79

style: Cleaner tqdms

a074cc9

speedup test: bottlnecks re-introduced. Testing shift back.

9203f56

nasiryahm force-pushed the efficiencies branch from 9e613c4 to 9203f56 Compare April 11, 2026 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiencies#3

Efficiencies#3
nasiryahm wants to merge 30 commits intomainfrom
efficiencies

nasiryahm commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nasiryahm commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant