converted to C++ BLAS/LAPACK interface#237
Conversation
…inear algebra bindings
…valeev/feature/linalgpp
…compatible with ninja
…/LAPACK++) properly
# Conflicts: # .gitlab-ci.yml # external/versions.cmake
wavefunction91
left a comment
There was a problem hiding this comment.
Overall I think it looks good, but I'm a bit confused as to why you chose to only move blas over to blaspp and not lapack in e.g. potrf, trtri, etc?
| std::cbegin(left.range().extent()), std::cbegin(right.range().extent()))); | ||
|
|
||
| // Compute gemm dimensions | ||
| using integer = TiledArray::math::blas::integer; |
There was a problem hiding this comment.
Not sure if this type of logic is used elsewhere, but use of blaspp/lapackpp generally precludes the need to explicitly specify integer type: all of the interfaces are int64_t. So you can pretty much move over all integer types to be 64-bit and the internal conversions to the appropriate BLAS integer are handled in the BLAS/LAPACK wrappers internally (they do introspection to make sure things are correct)
There was a problem hiding this comment.
@wavefunction91 TA::math::blas::integer alias is used elsewhere (MPQC). We could deprecate them but I think it's relatively harmless.
|
|
||
| /// converts Op to ints in manner useful for bit manipulations | ||
| /// NoTranspose -> 0, Transpose->1, ConjTranspose->2 | ||
| inline auto to_int(Op op) { |
There was a problem hiding this comment.
If you're using these for bit manipulation, you should probably impose 32/64-bit and signed-ness here as opposed to auto. I know ISO dictates this is int32_t here, but might be good to enforce it if only for readability
|
|
||
| inline cublasOperation_t to_cublas_op( | ||
| madness::cblas::CBLAS_TRANSPOSE cblas_op) { | ||
| inline cublasOperation_t to_cublas_op(math::blas::Op cblas_op) { |
There was a problem hiding this comment.
Don't the blaspp cuBLAS bindings handle this?
There was a problem hiding this comment.
Indeed.
This points out the cuBLAS in BLAS++ issue. Having looked at it I don't think there is a way to fully control how the work gets assigned to streams through BLAS++ API, so switching to their device interface and removing all CUDA stuff is not an option as it will break TA contractions. Without looking more deeply I don't see how to work around this.
I propose we defer device BLAS work to another PR.
There was a problem hiding this comment.
I agree that this can get deferred, but FWIW they do have stream management
It still needs a bit of work (i.e. they dont allow for attaching a queue onto existing streams, which is something that would be beneficial in this use case)
| #include <blas/dot.hh> | ||
| #include <blas/gemm.hh> | ||
| #include <blas/scal.hh> | ||
| #include <blas/util.hh> |
There was a problem hiding this comment.
I get the point, but not sure this really changes the coupling. Might just be verbosity with no net benefit, we should look into this
There was a problem hiding this comment.
this is along the iwyu philosophy, which I mostly agree with.
| static constexpr auto ConjTranspose = madness::cblas::ConjTrans; | ||
| /// the integer type used by C++ BLAS/LAPACK interface, same as that used by | ||
| /// BLAS++/LAPACK++ | ||
| using integer = int64_t; |
There was a problem hiding this comment.
See previous comment on integer types
| using TiledArray::math::linalg::cholesky_solve; | ||
| using TiledArray::math::linalg::cholesky_lsolve; | ||
| } | ||
| using TiledArray::math::linalg::cholesky; |
There was a problem hiding this comment.
Don't these aliases defeat the purpose of nesting the math::linalg entirely?
| auto* a = A.data(); | ||
| lapack_int lda = n; | ||
| integer lda = n; | ||
| TA_LAPACK(potrf, uplo, n, a, lda); |
There was a problem hiding this comment.
Shouldn't we just be using lapack::potrf here?
There was a problem hiding this comment.
TA_LAPACK(potrf,... resolves to that, but also checks the return and if needed throws exception with embedded __FILE__ and __LINE__
# Conflicts: # INSTALL.md # external/versions.cmake
|
Yes I saw that, it appears to be pretty simple, i.e. there is no way to
specify which one of the "parallel" streams to launch into. That's
showstopper for us.
…On Thu, Dec 31, 2020, 4:19 PM David Williams-Young ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/TiledArray/cuda/cublas.h
<#237 (comment)>
:
> @@ -76,17 +76,16 @@ class cuBLASHandlePool {
};
// thread_local cublasHandle_t *cuBLASHandlePool::handle_;
-inline cublasOperation_t to_cublas_op(
- madness::cblas::CBLAS_TRANSPOSE cblas_op) {
+inline cublasOperation_t to_cublas_op(math::blas::Op cblas_op) {
I agree that this can get deferred, but FWIW they do have stream management
https://bitbucket.org/icl/blaspp/src/4cae6e956825496d13e087c4620dd67fe40d75bb/include/blas/device.hh#lines-79
It still needs a bit of work (i.e. they dont allow for attaching a queue
onto existing streams, which is something that would be beneficial in this
use case)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#237 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQXIZZRU62ABZHSHBWCG4TSXTTGHANCNFSM4VGEUCNQ>
.
|
no longer depend on MADNESS linear algebra bindings