Skip to content

Generate CUDA stubs dynamically#2884

Merged
wanghan-iapcm merged 20 commits into
deepmodeling:develfrom
njzjz:implib.so
Oct 3, 2023
Merged

Generate CUDA stubs dynamically#2884
wanghan-iapcm merged 20 commits into
deepmodeling:develfrom
njzjz:implib.so

Conversation

@njzjz

@njzjz njzjz commented Sep 30, 2023

Copy link
Copy Markdown
Member

This PR follows google/tsl@726288e and has the same motivation as that commit. libcudart.so stubs using assembly language are generated dynamically by implib.so during CMake execution, instead of storing many versions of stub files in the repository.

Implib.so is vendored in source/3rdparty directory in the size of 156KB and slightly modified (to allow no CUDA library found).

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz added the Test CUDA Trigger test CUDA workflow label Sep 30, 2023
@github-actions github-actions Bot removed the Test CUDA Trigger test CUDA workflow label Sep 30, 2023
@njzjz

njzjz commented Sep 30, 2023

Copy link
Copy Markdown
Member Author

@caic99 Could you take a look at the NVIDIA machine... It worked fine in #2841 but appears to have apt errors in this PR.

You might want to run 'apt --fix-broken install' to correct these.
  The following packages have unmet dependencies:
   nvidia-dkms-535 : Depends: nvidia-kernel-common-535 (>= 535.104.05) but 535.[86](https://github.com/deepmodeling/deepmd-kit/actions/runs/6361031399/job/17275175999?pr=2884#step:4:88).10-0ubuntu1 is to be installed
   nvidia-driver-535 : Depends: nvidia-kernel-common-535 (>= 535.104.05) but 535.86.10-0ubuntu1 is to be installed
                       Recommends: libnvidia-compute-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-decode-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-encode-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-fbc1-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-gl-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
  E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

@codecov

codecov Bot commented Sep 30, 2023

Copy link
Copy Markdown

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (cf61140) 75.46% compared to head (c08537d) 75.46%.
Report is 5 commits behind head on devel.

Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #2884      +/-   ##
==========================================
- Coverage   75.46%   75.46%   -0.01%     
==========================================
  Files         244      244              
  Lines       24518    24522       +4     
  Branches     1580     1580              
==========================================
+ Hits        18503    18505       +2     
- Misses       5084     5086       +2     
  Partials      931      931              

see 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
This reverts commit 1e1a041.
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
…move "abort"

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz marked this pull request as ready for review September 30, 2023 20:02
@njzjz njzjz requested a review from wanghan-iapcm September 30, 2023 20:02
@njzjz

njzjz commented Sep 30, 2023

Copy link
Copy Markdown
Member Author

Since the machine to test CUDA is broken, I've tested it on the local machine.

@wanghan-iapcm wanghan-iapcm enabled auto-merge (squash) October 3, 2023 03:17
@wanghan-iapcm wanghan-iapcm merged commit f256dff into deepmodeling:devel Oct 3, 2023
@caic99

caic99 commented Oct 3, 2023

Copy link
Copy Markdown
Member

@caic99 Could you take a look at the NVIDIA machine... It worked fine in #2841 but appears to have apt errors in this PR.

You might want to run 'apt --fix-broken install' to correct these.
  The following packages have unmet dependencies:
   nvidia-dkms-535 : Depends: nvidia-kernel-common-535 (>= 535.104.05) but 535.[86](https://github.com/deepmodeling/deepmd-kit/actions/runs/6361031399/job/17275175999?pr=2884#step:4:88).10-0ubuntu1 is to be installed
   nvidia-driver-535 : Depends: nvidia-kernel-common-535 (>= 535.104.05) but 535.86.10-0ubuntu1 is to be installed
                       Recommends: libnvidia-compute-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-decode-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-encode-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-fbc1-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
                       Recommends: libnvidia-gl-535:i386 (= 535.104.05-0ubuntu0.22.04.4)
  E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

Hi @njzjz ,
Sorry for the late reply since I just came back from my vacation.
Please note that the environment for a self-hosted runner is persistent, and the apt-get commands might have conflict with other workflows executed on this runner.
I've manually fixed the broken packages. I would strongly suggest running this workflow in a docker container.

@njzjz

njzjz commented Oct 3, 2023

Copy link
Copy Markdown
Member Author

I would strongly suggest running this workflow in a docker container.

I'll have a try.

njzjz added a commit to njzjz/deepmd-kit that referenced this pull request Oct 3, 2023
See deepmodeling#2884 (comment)

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz mentioned this pull request Oct 3, 2023
wanghan-iapcm pushed a commit that referenced this pull request Oct 5, 2023
See
#2884 (comment)

---------

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants