Skip to content

Implement Principal Component Analysis (PCA) module #1080

@JAi-SATHVIK

Description

@JAi-SATHVIK

Motivation

WHY NEEDED

Principal Component Analysis is the workhorse of dimensionality reduction in modern scientific computing (e.g., climate patterns, fluid dynamics modes, machine learning preprocessing). Currently, users must manually combine stdlib_stats_cov, stdlib_stats_mean, and stdlib_linalg_svd to perform this common task. Adding a dedicated PCA module would significantly lower the barrier to entry for data science workflows in Fortran, bringing it closer to ecosystems like scikit-learn or MATLAB.

PLAN

I have analyzed the current capabilities of stdlib_stats and stdlib_linalg and found we have all the prerequisites. I propose implementing a pca derived type or set of routines that offers:

Dual Algorithms:
Eigendecomposition of Covariance Matrix: Leveraging existing stdlib_stats_cov and stdlib_linalg_eig.
SVD Method: Performing SVD directly on the centered data matrix (often more numerically stable) using stdlib_linalg_svd.
API Design:
pca_fit(X, n_components): Computes the principal components (eigenvectors), singular values, and explained variance ratios.
pca_transform(X, pca_model): Projects new data into the reduced dimensional space.
pca_inverse_transform(): Reconstructs original data from the reduced space.
Integration: This module would sit naturally in stdlib_stats, acting as a bridge between the statistics and linear algebra modules.

Prior Art

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
https://www.mathworks.com/help/stats/pca.html

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestideaProposition of an idea and opening an issue to discuss ittopic: linalgLinear algebratopic: mathematicslinear algebra, sparse matrices, special functions, FFT, random numbers, statistics, ...topic: statisticsStatistical functions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions