Skip to content

Conversation

@JAi-SATHVIK
Copy link
Contributor

@JAi-SATHVIK JAi-SATHVIK commented Jan 6, 2026

@JAi-SATHVIK JAi-SATHVIK marked this pull request as draft January 6, 2026 02:37
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.66%. Comparing base (d2fdd50) to head (75db887).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1086      +/-   ##
==========================================
- Coverage   68.69%   68.66%   -0.04%     
==========================================
  Files         393      393              
  Lines       12720    12720              
  Branches     1376     1376              
==========================================
- Hits         8738     8734       -4     
- Misses       3982     3986       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

end interface moment


#! Note: PCA uses SVD and EIGH which rely on LAPACK. LAPACK backends do not support extended (xdp) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate. stdlib's modernized BLAS/LAPACK do support all kinds. The only caveat is that when linking against an optimized library such as OpenBLAS or MKL, they only provide support for simple and double precision, so for extended and quadruple, the internal stdlib version will be used.

x_centered = x
end if

res = matmul(x_centered, transpose(components))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advise to switch to using gemm instead to avoid temporal array allocations and ensuring use of optimized backends when linking against.



#:for k1, t1 in REAL_KINDS_TYPES
module function pca_transform_${k1}$(x, components, x_mean) result(res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, please preferer subroutines when returning arrays. Subroutines give more control and predictability on memory allocation/recycling. Functions (depending on the compiler options) might or might not create temp arrays or reallocate the left-hand-side of the assignment. You can always create a function on top of the subroutine to have a functional style, but for performance and safety subroutines are more commendable.


integer(ilp) :: i, n
n = size(x_reduced, 1, kind=ilp)
res = matmul(x_reduced, components)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here, please prefer gemm

@jalvesz jalvesz linked an issue Jan 6, 2026 that may be closed by this pull request
mu = mean(x, dim=1)
if (present(x_mean)) x_mean = mu

err0 = linalg_state_type("pca", LINALG_SUCCESS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to initialize the state handler - leads to unnecessary overhead

Suggested change
err0 = linalg_state_type("pca", LINALG_SUCCESS)

Comment on lines 93 to 95
do j = 1, p
idx(j) = j
end do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use array operations where straightforward

Comment on lines 96 to 105
! Simple bubble sort
do i = 1, p-1
do j = i+1, p
if (lambda(idx(i)) < lambda(idx(j))) then
m = idx(i)
idx(i) = idx(j)
idx(j) = m
end if
end do
end do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use stdlib functions from the sorting module where already provided

err0 = linalg_state_type("pca", LINALG_ERROR, "Unknown method: "//method_)
end if

if (present(err)) err = err0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix state handling: should error stop on missing argument + error

Suggested change
if (present(err)) err = err0
call err0%handle(err)

@jalvesz
Copy link
Contributor

jalvesz commented Jan 7, 2026

@JAi-SATHVIK the build fails happen because in the CMakeLists.txt for the stats module you need to add something like target_link_libraries(stats PUBLIC blas lapack)

Also, in my previous comment regarding the kinds support, what I meant was: please enable all kinds, stdlib provids the backends. Linking against optimized libraries is optional and would increase performance for simple and double precision. But all kinds are supported.

@JAi-SATHVIK
Copy link
Contributor Author

Thank you @jalvesz , for the clarification! I've made the following updates:

CMakeLists.txt: The target_link_libraries(stats PUBLIC blas lapack) is already in place in src/stats/CMakeLists.txt which links the stats module against stdlib's internal BLAS/LAPACK targets.

Documentation: Updated the inline comments in both
stdlib_stats_pca.fypp and stdlib_stats.fypp to accurately reflect that:
All real kinds (sp, dp, xdp, qp) are supported by stdlib's internal BLAS/LAPACK backends.

@jalvesz
Copy link
Contributor

jalvesz commented Jan 7, 2026

New fails are happening because of the dependency on the sorting module. This exposes an issue with the modularization of the library. The sorting module being at the root of src is not visible by the stats module within its own independent folder... we might need to reconsider next steps.

One idea would be to make this library (pca) not a submodule of stats but a module in itself at the root of src such that it can use easily any other module such as stats, blas/lapack and sorting. Or there might be another approach to think about

Cc @jvdp1 @perazz

@jvdp1
Copy link
Member

jvdp1 commented Jan 7, 2026

New fails are happening because of the dependency on the sorting module. This exposes an issue with the modularization of the library. The sorting module being at the root of src is not visible by the stats module within its own independent folder...

In this case, add ../stdlib_sorting.fypp in the CMakeLists.txt should be enough (similarly to stdlib_string_type.fypp already present in CMakeLists.txt)

we might need to reconsider next steps.

However, I agree with that. When I was working on #1081, I started to get "faked" circular dependencies.

One idea would be to make this library (pca) not a submodule of stats but a module in itself at the root of src such that it can use easily any other module such as stats, blas/lapack and sorting. Or there might be another approach to think about

If the CMake file is correctly written, a submodule "pca" should not be a problem.
However, in terms of efficiency for the stats module (but also for other modules, like stdlib_linalg), it might be good to use modules instead of submodules: as fpm compiles only what is needed, if a user only needs mean, then the blas and lapack modules are currently not compiled (if I am correct). However, if the pca procedures are added as submodules of the stats module, then the blas and lapack will be compiled, even if only the procedure mean is used by the user. Similar "issues" can easily happen for stdlib_linalg.

@JAi-SATHVIK
Copy link
Contributor Author

Thank you @jalvesz @jvdp1 for the insights.

Regarding the immediate build failure, I will try adding ../stdlib_sorting.fypp to the src/stats/CMakeLists.txt as suggested to resolve the visibility issue with the sorting module.

Regarding the structural change: I'm open to moving PCA to a standalone module in src/ if that aligns better with the library's goals for modularity and reducing compilation overhead. Should I proceed with the CMake fix first to verify the current logic, or would you prefer I start refactoring it into its own module now?

@jalvesz
Copy link
Contributor

jalvesz commented Jan 8, 2026

I'll suggest to go step by step: first try to fix "as is", then let's continue the discussion on what would be the best strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Principal Component Analysis (PCA) module

4 participants