Skip to content

Feat: add PPCG solver for PW diagonalization#7580

Open
Silver-Moon-Over-Snow wants to merge 54 commits into
deepmodeling:developfrom
Silver-Moon-Over-Snow:ppcg
Open

Feat: add PPCG solver for PW diagonalization#7580
Silver-Moon-Over-Snow wants to merge 54 commits into
deepmodeling:developfrom
Silver-Moon-Over-Snow:ppcg

Conversation

@Silver-Moon-Over-Snow

@Silver-Moon-Over-Snow Silver-Moon-Over-Snow commented Jul 2, 2026

Copy link
Copy Markdown

Summary

  • add PPCG support to the PW hsolver path
  • refactor PPCG implementation into focused headers under source/source_hsolver/ppcg
  • use ATen/container LAPACK kernels and ModuleBase dot kernels where applicable
  • make BLOCK_SUBSPACE the PPCG production strategy and update PPCG tests
  • harden generalized eigensolver fallback handling for PPCG subspace solves
  • link EXX info into MODULE_HSOLVER_pw tests to satisfy __EXX builds
  • keep the default PPCG quick convergence test quiet in CI logs

Tests

  • ctest --test-dir build-ppcg-openmp -R MODULE_HSOLVER_ppcg --output-on-failure
  • ctest --test-dir build-ppcg-openmp -R MODULE_HSOLVER_pw --output-on-failure
  • ctest --test-dir build-ppcg-openmp -R MODULE_IO_read_item_serial --output-on-failure
  • OMP_NUM_THREADS=4 ./build-ppcg-openmp/source/source_hsolver/test/MODULE_HSOLVER_ppcg --gtest_also_run_disabled_tests --gtest_filter=DiagoPPCGBenchmarkTest.DISABLED_FullBenchmark --gtest_brief=1

Silver-Moon-Over-Snow and others added 30 commits March 28, 2026 13:19
Consider the previous contributions made by classmates, I'm only capable to make small difference without disrupting the entire program ---- like such a small "static".
…w_Small-Changes

2025PKUCourseHW5: Case: 1 - Change rank_seed_offset to static const
…ent)

Add PPCG iterative diagonalization with two strategies:
- CONJUGATE_GRADIENT: band-by-band Polak-Ribiere CG (verified working)
- BLOCK_SUBSPACE: block subspace diagonalization

Includes potrf retry fix: save/restore original matrix before applying
diagonal shift, preventing accumulated shifts from corrupting the matrix.

Test: 1D particle-in-a-box (n_dim=10), CG strategy matches exact
eigenvalues with error 4.3e-12.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ormalization

Three fixes for numerical stability:

1. potrf: save/restore original matrix before diagonal shift retries,
   preventing accumulated shifts from corrupting the Cholesky factor.

2. sygvd/syevd: skip workspace query (lwork=-1) and allocate directly.
   The LAPACK replacement ignores workspace queries, causing the second
   call to operate on already-transformed data, corrupting eigenvalues.

3. Block subspace: add chol_qr + hpsi/spi recomputation after
   update_one_block and every rayleigh_ritz, keeping wavefunctions
   S-orthonormal and preventing numerical drift of H|psi> and S|psi>.

Results (1D particle-in-a-box, S=I):
- CG nband=1: error 4.3e-12 (unchanged, already working)
- BLOCK_SUBSPACE nband=1: no longer NaN, converges (to wrong eigenvalue
  due to algorithmic limitation with S=I)
… RR steps

1. solve_small_generalized: save/restore M matrix before retry with shifts
   (prevents accumulation of shifts on sygvd-corrupted M)

2. BLOCK_SUBSPACE: add Krylov fallback for near-collinear p/w vectors
   When p is nearly parallel to w (cos^2 > 0.99), replace p with H·w
   to keep the 3-vector subspace [psi, w, p] full rank.
   This fixes NaN eigenvalues for nband>1 with S=I.

3. LAPACK: use standard workspace query (lwork=-1) pattern for syevd/sygvd
   More robust with real LAPACK implementations.

4. CG: add periodic Rayleigh-Ritz subspace rotation every rr_step iterations
   Corrects band ordering and eigenvalue estimates after band-by-band
   line minimization. Resets PR state after rotation.
The gamma_dot function returns only the real part of inner products,
which is correct for Hermitian forms like <psi|H|psi> but wrong for
projection coefficients where the imaginary part matters.

In orth_gradient and project_against, the projection coefficient
<psi_i | v> must use the full complex inner product to correctly
remove the overlap. Using only Re(<psi_i | v>) leaves an imaginary
component that corrupts the search direction, causing excited-state
bands to converge to wrong eigenvalues.

This fixes the CG strategy bands 1 and 2 converging to the highest
eigenvalue (3.919) instead of the first excited states.
…PACE

The chol_qr_active call after update_one_block re-orthonormalized psi
but left the p vector in the old basis, creating an inconsistency.
The p vector is constructed in update_one_block using the same subspace
rotation as psi, so they start consistent. Adding chol_qr_active before
the p vector is updated breaks this consistency.
This change was unrelated to the PPCG integration and should not
have been included.
H and S are real symmetric operators whose eigenvectors are real.
The previous complex random initialization produced complex
off-diagonal elements in the H-gram matrix (max |Im| ~ 0.5 for
nband=3), causing Re(<psi_i|H|psi_j>) != <psi_i|H|psi_j>.  The
gamma_dot function only returns the real part, so all subspace
Gram matrices (built via gram()) computed wrong off-diagonals,
leading to incorrect eigenvalues from sygvd.

With real-only psi all inner products are real and gamma_dot is
exact, so both BLOCK_SUBSPACE and CONJUGATE_GRADIENT strategies
should now converge to the correct eigenvalues.
…tioning

The 3-block subspace method builds a generalized eigenvalue problem
with basis V = [psi, w, p] where p is constructed from the previous
subspace eigenvectors (p_new += w_l * cw in update_one_block).  This
makes p a linear combination of the w vectors, causing the [w, p]
block of the S-gram matrix M to become nearly rank-deficient.

With nband=3 and sbsize=4 the 9x9 M matrix has condition number
large enough that dsygvd produces negative eigenvalues for the
positive-definite problem (observed: -0.26 at iter=2), and the
eigenvalues diverge exponentially thereafter.

Setting use_p=false reduces the subspace to [psi, w] (2-block),
which is a preconditioned Davidson-like method.  It converges
robustly: the BLOCK_SUBSPACE test now passes in 57 ms with all
3 eigenvalues within 1e-8 of the exact values.

The 3-block code path is preserved for future re-enablement once
a more robust p-vector construction is implemented.
With rr_step=4, the non-RR iterations use Cholesky orthonormalization
which mixes bands through the upper-triangular U^{-1}, causing high-energy
bands to contaminate low-energy ones.  This drives CG eigenvalues to the
spectrum maximum [3.31, 3.68, 3.92] instead of the correct lowest values
[0.081, 0.317, 0.690].

Using rr_step=1 forces Rayleigh-Ritz every iteration, which correctly
diagonalizes the subspace and preserves band ordering.
The orth_cholesky call before rayleigh_ritz mixes bands through the
upper-triangular U^{-1} factor, contaminating low-energy bands with
high-energy components.  This drives CG eigenvalues to the spectrum
maximum instead of the minimum.

rayleigh_ritz solves the generalized eigenvalue problem K v = λ M v
via dsygvd, which correctly handles non-S-orthogonal bases.  The
orth_cholesky is not needed and is actively harmful.

This makes the CG RR path consistent with BLOCK_SUBSPACE, which
calls rayleigh_ritz without prior orth_cholesky.
BLOCK_SUBSPACE starts with rayleigh_ritz (line 1085) which finds correct
eigenvalues and rotates psi before the iteration loop.  CG was using
diagonal Rayleigh quotients instead — these are poor approximations for
random initial guesses, producing wrong gradients that drive the band-by-band
line_minimize toward high-energy eigenstates.

With rr_step=1 (every-iteration RR), the CG loop itself is now correct,
but without an initial RR the first line_minimize step already pushes
psi in the wrong direction, and subsequent RR steps cannot fully recover.
Backup preserved at diago_ppcg_test.cpp.bak
The linear approximation α = -C/B drops the α² term from the Rayleigh
quotient derivative dR/dα = 0.  This picks one of the two stationary
points (minimum or maximum) arbitrarily.  For bands far from convergence
it can select the MAXIMUM, driving ψ toward high-energy states instead
of the desired lowest eigenvalues.

Solve the full quadratic Aα² + Bα + C = 0, evaluate R(α) for both
roots (and the linear guess), and pick the one with the lowest R.

Also restore the CG unit test (rr_step=1, initial rayleigh_ritz).
…e use_p

Three changes to make both PPCG strategies correctly converge with rr_step=4:

1. CG non-RR path: After orth_cholesky, solve the nband x nband subspace
   generalized eigenvalue problem instead of using diagonal Rayleigh quotients.
   The upper-triangular U^{-1} from Cholesky mixes high-energy components into
   low-energy bands, making diagonal RQs overestimate the eigenvalues.  The
   subspace solve gives correct Ritz values without rotating the states,
   preserving Polak-Ribiere conjugate-direction accumulators.

2. BLOCK_SUBSPACE: Re-enable use_p=true (3-block [psi, w, p] subspace).
   The Krylov fallback (replace p with H·w when p ~ w) was already in place
   but dead because use_p was hardcoded to false.  Now it activates on the
   first iteration (p is zero-initialized) and whenever p becomes collinear
   with w after update_one_block.

3. CG test: Change rr_step from 1 back to 4 so the non-RR Cholesky path
   is exercised, validating the true Polak-Ribiere CG mechanism.
The 3-block [psi, w, p] subspace generalized eigenproblem becomes
ill-conditioned when residuals are small (near convergence).  The
[w, p] Gram block shrinks, the M matrix approaches singularity, and
dsygvd produces garbage eigenvectors that drive eigenvalues to
catastrophic values (e.g., -137775 instead of 0.081).

The p-bad H·w Krylov fallback fixes p~w collinearity but does not
address the small-residual ill-conditioning, which is fundamental to
the 3-block construction.  Keep use_p=false for robust convergence.
The [w,p] block of the Gram matrix M shrinks as residuals converge,
making M nearly singular and causing sygvd to produce garbage
eigenvectors.  Scaling w and p to unit S-norm keeps M well-conditioned
(diagonal ~1) without changing the subspace — Ritz values are identical
and Ritz vector coefficients cancel in update_one_block.

This enables the full 3-block [psi,w,p] subspace (use_p=true) by
addressing the fundamental ill-conditioning that the p-bad Krylov
fallback alone could not handle.
The Krylov fallback (replace p with Hw when p~w) was flawed:
when w is approximately an eigenvector (Hw ≈ λw), the replacement
does not fix collinearity.  After S-norm scaling, p ≈ w still,
M_wp ≈ [1,1;1,1] is rank-1, and dsygvd fails.

Instead, simply skip p for this iteration (use_p_now=false).
update_one_block still produces a valid p for the next iteration
from the w Ritz-vector contribution.
Add tests for:
- 2x2 matrix (smallest non-trivial case)
- Degenerate eigenvalues (H = I + J, multiplicity-3 degeneracy)
- Larger 20x20 tridiagonal with 5 bands
- Dense 8x8 matrix via Givens rotations (addresses full-matrix coverage)

All use CONJUGATE_GRADIENT strategy which has sygvd fallback.
BLOCK_SUBSPACE tests deferred due to dsygvd instability with some LAPACK builds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Covers diagonal, tridiagonal, dense, pentadiagonal, degenerate,
Neumann, S≠I, gamma_g0, single-band, all-band, many-band,
bad preconditioner, tight threshold, scaled, gapped spectrum,
rr_step=1, 1x1, and eigenvector quality checks.

Adds QuickBenchmark (CI-friendly) and DISABLED_FullBenchmark.
@Silver-Moon-Over-Snow Silver-Moon-Over-Snow changed the title Refactor PPCG LAPACK calls through ATen kernels Add PPCG block subspace solver Jul 2, 2026
@Silver-Moon-Over-Snow Silver-Moon-Over-Snow changed the title Add PPCG block subspace solver Add PPCG solver Jul 2, 2026
@Silver-Moon-Over-Snow Silver-Moon-Over-Snow changed the title Add PPCG solver Feat: add PPCG solver for PW diagonalization Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant