-
Notifications
You must be signed in to change notification settings - Fork 212
Description
Motivation
WHY NEEDED
Principal Component Analysis is the workhorse of dimensionality reduction in modern scientific computing (e.g., climate patterns, fluid dynamics modes, machine learning preprocessing). Currently, users must manually combine stdlib_stats_cov, stdlib_stats_mean, and stdlib_linalg_svd to perform this common task. Adding a dedicated PCA module would significantly lower the barrier to entry for data science workflows in Fortran, bringing it closer to ecosystems like scikit-learn or MATLAB.
PLAN
I have analyzed the current capabilities of stdlib_stats and stdlib_linalg and found we have all the prerequisites. I propose implementing a pca derived type or set of routines that offers:
Dual Algorithms:
Eigendecomposition of Covariance Matrix: Leveraging existing stdlib_stats_cov and stdlib_linalg_eig.
SVD Method: Performing SVD directly on the centered data matrix (often more numerically stable) using stdlib_linalg_svd.
API Design:
pca_fit(X, n_components): Computes the principal components (eigenvectors), singular values, and explained variance ratios.
pca_transform(X, pca_model): Projects new data into the reduced dimensional space.
pca_inverse_transform(): Reconstructs original data from the reduced space.
Integration: This module would sit naturally in stdlib_stats, acting as a bridge between the statistics and linear algebra modules.
Prior Art
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
https://www.mathworks.com/help/stats/pca.html
Additional Information
No response