Convert DG-RePlAce algorithm to Kokkos by kamilrakoczy · Pull Request #5352 · The-OpenROAD-Project/OpenROAD

kamilrakoczy · 2024-07-08T09:41:44Z

This MR converts DG-RePlAce algorithm that was originally written for CUDA to Kokkos.

Kokkos provides abstraction for writing parallel code that can be translated into several backends including CUDA, OpenMP and C++ threads.

Tested on single run with RTX 3090 and i7-8700 CPU @ 3.20GHz using ariane133 design.

	original placer	CUDA implementation	Kokkos (CUDA backend)	Kokkos (OpenMP backend)	Kokkos (Threads backend)
ariane133 global place time	11:27.39	0:57.70	1:33.49	3:24.12	6:08.94

github-actions

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 52. Check the log or trigger a new build to see more.

github-actions · 2024-07-08T09:49:01Z

warning: 'gpl2/MakeDgReplace.h' file not found [clang-diagnostic-error]

#include "gpl2/MakeDgReplace.h" ^

github-actions · 2024-07-08T09:49:01Z

warning: 'Kokkos_Core.hpp' file not found [clang-diagnostic-error]

#include <Kokkos_Core.hpp> ^

github-actions · 2024-07-08T09:49:01Z

warning: 'Kokkos_Core.hpp' file not found [clang-diagnostic-error]

#include <Kokkos_Core.hpp> ^

github-actions · 2024-07-08T09:49:01Z

warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change

void dct_2d_fft(const int M,

void dct_2d_fft(int M,

github-actions · 2024-07-08T09:49:01Z

warning: parameter 'N' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change

const int N,

int N,

github-actions · 2024-07-08T09:49:03Z

warning: call to 'ceil' promotes float to double [performance-type-promotion-in-math-fn]

src/gpl2/src/placerBase.cpp:40:

- #include <cstdio> + #include <cmath> + #include <cstdio>

Suggested change

binSizeX_ = ceil(static_cast<float>((ux_ - lx_)) / binCntX_);

binSizeX_ = std::ceil(static_cast<float>((ux_ - lx_)) / binCntX_);

github-actions · 2024-07-08T09:49:03Z

warning: call to 'ceil' promotes float to double [performance-type-promotion-in-math-fn]

Suggested change

binSizeY_ = ceil(static_cast<float>((uy_ - ly_)) / binCntY_);

binSizeY_ = std::ceil(static_cast<float>((uy_ - ly_)) / binCntY_);

github-actions · 2024-07-08T09:49:03Z

warning: 'db_sta/dbNetwork.hh' file not found [clang-diagnostic-error]

#include "db_sta/dbNetwork.hh" ^

github-actions · 2024-07-08T09:49:03Z

warning: call to 'round' promotes float to double [performance-type-promotion-in-math-fn]

src/gpl2/src/placerBase.h:38:

- #include <memory> + #include <cmath> + #include <memory>

Suggested change

+ static_cast<int64_t>(round(macroInstsArea_ * targetDensity_));

+ static_cast<int64_t>(std::round(macroInstsArea_ * targetDensity_));

github-actions · 2024-07-08T09:49:03Z

warning: member initializer for 'inst_' is redundant [modernize-use-default-member-init]

Suggested change

: inst_(nullptr),

: ,

maliberty · 2024-07-08T15:30:55Z

Earlier it was reported the runtime difference to be minimal but 0:57.70 vs 1:33.49 is more substantial. Is this expected?

kamilrakoczy · 2024-07-09T09:03:40Z

Earlier it was reported the runtime difference to be minimal but 0:57.70 vs 1:33.49 is more substantial. Is this expected?

Earlier measurements were done when some parts was still using native CUDA and using different design (black-parrot).
This measurements are single run on local machine while using it for other things too, so they are not very accurate.

I'd expect, it should be possible to achieve similar runtime using Kokkos, This results might suggest, that there are some unnecessary memory copies between host/device, but this needs to be investigated further.

maliberty · 2024-07-09T17:30:25Z

Please try to get a more precise measure of the runtime difference as this is important in deciding whether Kokkos is a good alternative to direct CUDA coding.

Do all the various versions produce the same result? That is also important.

maliberty · 2024-07-09T17:33:28Z

What was the thinking behind making kokkos a dependency but kokkos-fft a submodule? It seems like they could both be build dependencies (and added to the DependencyInstaller with an option).

QuantamHD · 2024-07-09T18:31:57Z

Please try to get a more precise measure of the runtime difference as this is important in deciding whether Kokkos is a good alternative to direct CUDA coding.

I think I would say direct CUDA coding isn't really a viable option. I would be personally opposed to its inclusion. I think Kokkos or something like it is the only viable path forward. The runtime differences don't look significant if you compare it to the overall speedup achieved.

We're going for a pragmatic path forward, and to me this meets my bar for the goals we set out.

Do all the various versions produce the same result? That is also important.

Agree that this is important to check. We may need to order the floats to get identical/sufficiently similar results.

maliberty · 2024-07-09T20:01:25Z

I think I would say direct CUDA coding isn't really a viable option. I would be personally opposed to its inclusion.

You personally pushed for the inclusion of gpuSolver.cu and said its was valuable as a template for future development. Shall we delete it? I was never in favor.

A 50% overhead is worth exploring to at least understand if not eliminate.

QuantamHD · 2024-07-09T21:50:06Z

You personally pushed for the inclusion of gpuSolver.cu and said its was valuable as a template for future development. Shall we delete it? I was never in favor.

I think that seems like the right move at this point. With more time and context I don't think it's viable for us to maintain two codebases.

A 50% overhead is worth exploring to at least understand if not eliminate.

+1 I just want to point out if this is the fastest we could go that seems fast enough for me.

kamilrakoczy · 2024-07-18T13:56:48Z

Do all the various versions produce the same result? That is also important.

No they don't and it was quite surprising, as I expected that original code and Kokkos with CUDA backend will produce the same result.
We investigated this and it turned out that it is because Kokkos passes all files that depends on it through nvcc_wrapper. This wrapper converts host compiler options (g++) to nvcc options and uses nvcc to compile all Kokkos-dependent sources. This is done to allow device code in single .cpp file instead of separate .cu file for it.

NVCC should do pre-processing and compilation for device code and produce CUDA binary and it should leave host code for host compiler.

We checked that when nvcc is used to compile InitialPlace, Eigen solveWithGuess returns different results with exactly the same inputs comparing to using g++ directly.

I suspect that this issue isn't only related to Eigen: when I disabled initial placement, runtime of Kokkos and original code were almost the same, but results were still different (I haven't investigated reason for this).

What was the thinking behind making kokkos a dependency but kokkos-fft a submodule? It seems like they could both be build dependencies (and added to the DependencyInstaller with an option).

kokkos-fft is header only interface library that translates FFT calls into proper backend by detecting enabled backends in Kokkos, but I agree, if preferred, both kokkos and kokkos-fft could be dependencies.

A 50% overhead is worth exploring to at least understand if not eliminate.

I think this overhead is due to different initial placement, when initial placement is disabled runtime is very similar:

	CUDA implementation	Kokkos (CUDA backend)
ariane133 global place time without initial placement	0:55.52	0:58.25

I also did precise measurements using RTX 3080, 8 vCPU i9-12900 @ 2.42 GHz and 32GB of RAM with 10 runs using ariane133 design:

	min time [min]	avg time [min]	med time [min]	max time [min]
CUDA implementation	0:45	0:48	0:47	0:53
Kokkos (CUDA backend)	1:53	1:57	1:57	2:00
Kokkos (OpenMP backend)	1:50	2:04	1:54	2:37
Kokkos (threads backend)	3:42	3:43	3:43	3:45

maliberty · 2024-07-18T15:10:28Z

Thanks for the analysis. It would be good to get to the bottom of the difference as it will make regression testing hard otherwise. Is nvcc calling g++ with different flags?

kamilrakoczy · 2024-07-19T07:33:04Z

Is nvcc calling g++ with different flags?

Arguments that are passed to nvcc and that nvcc should pass to g++ are the same.
I haven't investigated yet how (with what flags) g++ is invoked from nvcc.

maliberty · 2024-07-19T15:38:07Z

another possibility is that it is invoking a different g++ binary from another path

maliberty · 2024-10-14T04:42:11Z

Converted to a draft due to no progress.

jbylicki · 2025-01-07T13:17:36Z

I've rebased this branch onto latest master and started resolving the mentioned issues:

Eigen’s solveWithGuess() behaves differently on the Kokkos branch (with a suggestion that this is caused by nvcc_wrapper, a part of Kokkos responsible for redirecting compilations, not pertaining to CUDA, to the host compiler):

I've found that to not be the case. Early, I've recreated the same condition (where Eigen was running slowly) using clang++ as the Kokkos compiler and I've confirmed that nvcc_wrapper was not used then. The problem was Eigen, when detecting CUDA availability, was trying to use it. Nevertheless, I saw no peak in GPU usage when initial_place was running, so I've disabled it and saw the numbers return to baseline (the same as in the CUDA-native implementation).

What is the performance difference between Kokkos and CUDA-native implementations?

To prioritize merging of GPU-accelerated placement, the focus was to get the branch issue-free before optimizing. In my testing, Kokkos-based algorithm on black-parrot spends about 10 seconds in libcuda.so, whereas the CUDA-native implementation spends around 5. All other timings are comparable, making the entire run about 5 seconds longer.

Future / subsequent work:

Make Kokkos a submodule: Due to varying conditions on host machines, most Kokkos libraries available as a package ship without either CUDA or OMP support. Having a dependency that has to be manually compiled and set correctly to have a functioning and fast implementation might intruduce complexity for the end user. Therefore, I suggest not migrating kokkos-fft to be a dependency and using kokkos, that is already cloned as a submodule to kokkos-fft, as an in-tree library. The issue I'm currently facing is that internal deprecations of CMake symbols are being triggered when Kokkos' compilation is triggered as a child project and not the parent.
Optimize memory accesses and the Kokkos implementation itself: I've confirmed that memory copying is one of the causes of the algorithm being slower, and fixes are in development, waiting for the more pressing issues to be resolved.

jbylicki · 2025-01-09T17:11:51Z

I added a configuration option to etc/Build.sh, -use_gpl2 that will include the gpl2 subdirectory and launch the compilation of kokkos via kokkos-fft in CMake. I additionally assigned the -gpu flag from the build script to enable the CUDA backend in Kokkos.

maliberty · 2025-01-09T17:37:09Z

I would prefer to see kokkos as part of the dependency installer rather than as a submodule. There should be no need to compile it for each workspace on a machine.

jbylicki · 2025-01-09T17:56:27Z

With the current setup, it would be possible to support both compilation schemes, with the priority set towards the DependencyInstaller - if a system-wide Kokkos installation would be detected, it will be used during compilation. I would suggest leaving the possibility to use in-tree Kokkos and kokkos-fft (if kokkos-fft was also moved to be downloaded via DependencyInstaller), as the script is tailored only towards Ubuntu users. If a system-wide package is not detected, both dependencies can be installed via FetchContent and built in-tree.

maliberty · 2025-01-09T18:01:00Z

If someone wants to put a local copy in-tree that's fine but I'd like to avoid having a submodule.

jbylicki · 2025-01-09T18:07:24Z

I'll add support for kokkos and kokkos-fft via the DependencyInstaller then. The submodule could be deleted while keeping in-tree support - CMake would in case of a system-wide package being absent handle the download by the FetchContent directive, and the build would have conditionals in place to link correctly.

jbylicki · 2025-02-12T16:51:35Z

I've added nested parallelism to the most time consuming kernel - computeBCPosNegKernel. After rebasing both branches to the same base commit, the performance results are as follows for the black-parrot design with the CUDA backend:

CUDA-native: 24.606 seconds (total time: 114.50 s, skipped intial place: 94.49 s)
Kokkos: 23.614 seconds (total time: 114.42 s, skipped intial place: 95.07 s)

Additionally, a concern was raised wrt. non-deterministic results that are returned from Kokkos, depending on the compute device used for processing. To validate the flow, each variant was subjected to a run from syntheis to the final step. While it's true that those results are varying, they have minimal impact on the actual parameters of the finished flow. Additionally, the results are deterministic on a per-device basis, even when the compute device is calculating under heavy external loads (especially applicable for GPUs).

Test subjects were:

master branch commit 7e0fce872123, as baseline and base for other branches
cuda-native, the original CUDA-native implementation, rebased onto the same base as other branches
kokkos-cpu, the Kokkos-based flow, ran on the OpenMP backend
kokkos-gpu, the Kokkos-based flow, ran on the CUDA backend

Metrics collected were taken from the final report and log, and were:

Total Negative Slack (tns)
Worst Negative Slack (wns)
Total power
Design area and utilization

Results:

Branch	TNS	WNS	Design area, utilization	Total Power
`master`	-2.42	-2.42	760397 u^2 45% utilization	2.57e-01 W
`cuda-native`	-2.40	-2.40	753511 u^2 44% utilization	2.49e-01 W
`kokkos-cpu`	-2.49	-2.49	753608 u^2 44% utilization	2.50e-01 W
`kokkos-gpu`	-2.44	-2.44	753674 u^2 44% utilization	2.50e-01 W

maliberty · 2025-02-12T17:43:48Z

Very nice! How is the cpu vs gpu runtime with your latest changes? Is this ready for review?

github-actions

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 45. Check the log or trigger a new build to see more.

github-actions · 2025-02-12T17:48:37Z

warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change

void dct_2d_fft(const int M,

void dct_2d_fft(int M,

github-actions · 2025-02-12T17:48:38Z

warning: parameter 'N' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change

const int N,

int N,

github-actions · 2025-02-12T17:48:38Z

warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change

void idct_2d_fft(const int M,

void idct_2d_fft(int M,

github-actions · 2025-02-12T17:48:38Z

warning: parameter 'N' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change

const int N,

int N,

github-actions · 2025-02-12T17:48:38Z

warning: parameter 'M' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change

void idxst_idct(const int M,

void idxst_idct(int M,

github-actions · 2025-02-12T17:48:40Z

warning: member initializer for 'isFixed_' is redundant [modernize-use-default-member-init]

Suggested change

isFixed_(false)

github-actions · 2025-02-12T17:48:40Z

warning: result of integer division used in a floating point context; possible loss of precision [bugprone-integer-division]

int ux = lx + floor(bbox->getDX() / 2) * 2; ^

github-actions · 2025-02-12T17:48:40Z

warning: result of integer division used in a floating point context; possible loss of precision [bugprone-integer-division]

int uy = ly + floor(bbox->getDY() / 2) * 2; ^

github-actions · 2025-02-12T17:48:41Z

warning: the parameter 'ps' is copied for each invocation but only used as a const reference; consider making it a const reference [performance-unnecessary-value-param]

Suggested change

void Instance::dbSetPlacementStatus(odb::dbPlacementStatus ps)

void Instance::dbSetPlacementStatus(const odb::dbPlacementStatus& ps)

src/gpl2/src/placerObjects.h:105:

- void dbSetPlacementStatus(odb::dbPlacementStatus ps); + void dbSetPlacementStatus(const odb::dbPlacementStatus& ps);

github-actions · 2025-02-12T17:48:41Z

warning: member initializer for 'pin_' is redundant [modernize-use-default-member-init]

Suggested change

: pin_(nullptr),

: ,

jbylicki · 2025-02-13T19:44:33Z

Yes, it's ready for review. I've applied the suggested clang-tidy fixes and added the missing RockyLinux9 package.

The performance difference between CUDA and OpenMP backends on black_parrot is:

CUDA: 85.38 s (dg_global_place call time: 20.46 s)
OpenMP: 96.58 s (dg_global_place call time: 29.83 s)

The test setup is an Intel i7-8700 and a NVIDIA GTX 1080Ti

hzeller · 2025-05-17T06:52:04Z

Looks like we soon also need bring kokkos and fftw into the BCR to get it nicely compiled in the new bazel build.

github-actions

clang-tidy made some suggestions

github-actions · 2025-05-18T03:05:11Z

warning: 'Kokkos_Core.hpp' file not found [clang-diagnostic-error]

#include "Kokkos_Core.hpp" ^

github-actions · 2025-05-18T03:05:12Z

warning: 'Kokkos_Core.hpp' file not found [clang-diagnostic-error]

#include "Kokkos_Core.hpp" ^

…guides Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>

Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>

mikesinouye · 2025-05-29T15:48:22Z

@jbylicki To confirm, is this PR ready for review?

jbylicki · 2025-05-30T09:32:29Z

@mikesinouye Yes, it is

maliberty · 2025-07-16T22:31:32Z

I tried to build this but I get:

/home/matt/OpenROAD/src/gpl2/src/dct.cpp: In function ‘void dct_2d_fft(int, int, const Kokkos::View<const Kokkos::complex<float>*>&, const Kokkos::View<const Kokkos::complex<float>*>&, const Kokkos::View<const float*>&, const Kokkos::View<float*>&, const Kokkos::View<Kokkos::complex<float>*>&, const Kokkos::View<float*>&)’:
/home/matt/OpenROAD/src/gpl2/src/dct.cpp:112:21: error: ‘Plan’ in namespace ‘KokkosFFT’ does not name a type
  112 |   static KokkosFFT::Plan fftplan(hostSpace,

Have you run into this?

sombraSoft · 2025-07-17T16:40:34Z

+    if [[ ${gpuDeps} == "nvidia" ]]; then
+        RELEASE_CODENAME=$(lsb_release -c | awk '{print $2}')
+
+        NEW_LINES="deb http://deb.debian.org/debian/ $RELEASE_CODENAME main contrib non-free
+        deb-src http://deb.debian.org/debian/ $RELEASE_CODENAME main contrib non-free"
+
+        if ! grep -q "$NEW_LINES" /etc/apt/sources.list; then
+            echo "$NEW_LINES" | tee -a /etc/apt/sources.list > /dev/null
+        fi
+        apt-get update
+        apt-get -y install --no-install-recommends libcu++-dev nvidia-cuda-toolkit
+    fi
+
+


Ok, there are some issues with this setup for installing nvidia-cuda-toolkit.

The command lsb_release -c on Ubuntu returns an Ubuntu codename like jammy, focal. The script then tries this codename to access a URL on the debian server http://deb.debian.org/debian/jammy which does not exist.

Even if the url existed, installing packages from a Debian repository on an Ubuntu system is highly discouraged. Ubuntu is based on Debian but has its own version of libraries and packages. Mixing them can cause critical dependency conflicts that can break the package manager and other essential system services.

I'm currently running Debian 12 bookworm and indeed it needs to have non-free in order to find nvidia-cuda-toolkit. For ubuntu though, they call it multiverse instead of non-free.
Also, libcu++-dev should already be included with nvidia-cuda-toolkit package.

Neither kokkos, nor gpl2 itself requires it. I guess that it was used for some debugging in past, and can be safely dropped. Besides, in other OpenROAD modules we already use spdlog's format (which is proxy to either std::format or bundled copy of fmtlib)

Build it with same flags as in cmake

sgizler · 2025-07-29T15:29:43Z

I tried to build this but I get:

/home/matt/OpenROAD/src/gpl2/src/dct.cpp: In function ‘void dct_2d_fft(int, int, const Kokkos::View<const Kokkos::complex<float>*>&, const Kokkos::View<const Kokkos::complex<float>*>&, const Kokkos::View<const float*>&, const Kokkos::View<float*>&, const Kokkos::View<Kokkos::complex<float>*>&, const Kokkos::View<float*>&)’:
/home/matt/OpenROAD/src/gpl2/src/dct.cpp:112:21: error: ‘Plan’ in namespace ‘KokkosFFT’ does not name a type
  112 |   static KokkosFFT::Plan fftplan(hostSpace,

Have you run into this?

I suspect that you have an older version of KokkosFFT (before introduction of KokkosFFT::Plan) installed in your system, and that it is being picked up. The quick workaround would probably be to remove /usr/local/include/kokkos and rerun DependencyInstaller.sh. We could add some kind of version check to the script, but maybe dropping Kokkos from DependencyInstaller and building it solely in-tree would be better solution?

The current setup is very complex:

DependencyInstaller.sh either:
- detects that KokkosFFT is present on host in which case it does nothing (without any version check)
- if not present, builds and installs KokkosFFT system-wide
CMakeLists.txt either:
- detects that KokkosFFT is present on host in which case it picks it up (not necessarily the one from DependencyInstaller.sh)
- if not present, builds and installs KokkosFFT in-tree

The version mismatch is not the only issue with using DependencyInstaller.sh for KokkosFFT installation:

If you want to switch between using CPU and GPU you have to remove the old installation of Kokkos, rerun DependencyInstaller with proper args and rebuild OpenROAD.
Kokkos seems to make an assumption that you use the same compiler suite for building the library and for the code using it. This makes it complicated to build OpenROAD with clang.
Kokkos has to be built with non-default compile-time options in order to give deterministic results. We would have to add some extra (potentially complicated) checks for that too.
Even if we add version and flags checks in DependencyInstaller.sh, there is no guarantee that CMake will pick the correct instance of Kokkos if user already had it installed globally

Building Kokkos solely in-tree would solve these problems and decrease maintenance burden related to having two independent methods of installing Kokkos.

maliberty · 2026-04-15T18:19:45Z

I think we should finish this off

hzeller · 2026-04-15T18:45:18Z

If this continues, I can work on making kokkos build with bazel (and once it works also put on BCR)

maliberty · 2026-04-15T18:50:56Z

That would be great. I've been holding off until we can retire cmake to avoid having to deal with it.

ApeachM · 2026-05-06T22:11:32Z

Late-comer thought — I've spent time reviewing gpl and have an RTX 5090 on hand. After reading both gpl and gpl2, much of gpl2 (Nesterov body, WA wirelength, BiCGSTAB) looks like the same algorithms as gpl. So instead of a separate gpl2 library, you could keep gpl and add Kokkos kernels behind an optional build flag, one function per PR. The CMake side can reuse -DGPU=ON in etc/Build.sh now; the bazel side will work once @hzeller finishes putting Kokkos on BCR. First PR could be getHpwl to set up the option and Kokkos dependency. WA gradient and a Poisson solver using Kokkos-FFT would follow. @sgizler's determinism fixes in this branch apply to either approach.

I'm raising this because the dual-codebase concern keeps coming up, and this approach might address it. Happy to try the first PR if it's useful, or to drop it and let #5352 continue as planned.

maliberty · 2026-05-07T16:24:34Z

The goal would be to have a single gpl. This is a separate code base as that is how it was developed academically but not the preferred end state. I am glad to take PRs for incrementally adding these ideas directly to gpl/. Dealing with Kokkos itself will be a significant first step.

ApeachM · 2026-05-08T13:40:43Z

Thanks @maliberty — that's clarifying. I'll put together a first PR with the CMake option and a Kokkos-backed getHpwl, leaving the CPU path as default. If the build-system side needs to align with @hzeller's BCR work, happy to coordinate before posting.

github-actions Bot reviewed Jul 8, 2024

View reviewed changes

maliberty marked this pull request as draft October 14, 2024 04:41

jbylicki force-pushed the convert-gpl2-kokkos branch 2 times, most recently from 04d428f to 925dd93 Compare January 7, 2025 13:14

jbylicki force-pushed the convert-gpl2-kokkos branch from 072e3b1 to 2dcac77 Compare January 10, 2025 17:19

jbylicki force-pushed the convert-gpl2-kokkos branch from 2dcac77 to 960ec72 Compare February 12, 2025 16:49

github-actions Bot reviewed Feb 12, 2025

View reviewed changes

jbylicki force-pushed the convert-gpl2-kokkos branch from a1b101b to 1d136de Compare February 13, 2025 19:44

maliberty marked this pull request as ready for review February 14, 2025 05:34

jbylicki force-pushed the convert-gpl2-kokkos branch from 86ae55d to 32d2b9d Compare May 14, 2025 17:09

github-actions Bot reviewed May 18, 2025

View reviewed changes

jbylicki added 4 commits May 21, 2025 11:57

Main.cc: gpl2: Re-formatted the changes in accordance with new style …

56719a1

…guides Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>

etc: DependencyInstaller: Added verbose gpu support and gpl2 compilation

ad4657a

Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>

gpl2: MakeDgReplace: Switch GPL2 away from sta's TclEncode

ed03130

Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>

Main: Added Kokkos finalize hooks cleaning the OR context

be2d8bd

Signed-off-by: Jan Bylicki <jbylicki@antmicro.com>

jbylicki force-pushed the convert-gpl2-kokkos branch from 32d2b9d to be2d8bd Compare May 21, 2025 10:08

sombraSoft reviewed Jul 17, 2025

View reviewed changes

sgizler added 4 commits July 29, 2025 17:17

Pin Kokkos version to tag rather than origin/main

6b84658

Define missing kokkos flags in DependencyInstaller

cf3ff23

Build it with same flags as in cmake

Replace deprecated KokkosFFT_ENABLE_HOST_AND_DEVICE

1715565

maliberty added the Stale A stale PR or issue subject to automated closure. label Mar 24, 2026

github-actions Bot removed the Stale A stale PR or issue subject to automated closure. label Mar 25, 2026

maliberty added the Stale A stale PR or issue subject to automated closure. label Mar 25, 2026

github-actions Bot closed this Apr 15, 2026

maliberty reopened this Apr 15, 2026

github-actions Bot removed the Stale A stale PR or issue subject to automated closure. label Apr 15, 2026

ApeachM mentioned this pull request May 9, 2026

gpl: opt-in HPWL GPU acceleration via Kokkos #10370

Open

6 tasks

	binSizeX_ = ceil(static_cast<float>((ux_ - lx_)) / binCntX_);
	binSizeX_ = std::ceil(static_cast<float>((ux_ - lx_)) / binCntX_);

	binSizeY_ = ceil(static_cast<float>((uy_ - ly_)) / binCntY_);
	binSizeY_ = std::ceil(static_cast<float>((uy_ - ly_)) / binCntY_);

	+ static_cast<int64_t>(round(macroInstsArea_ * targetDensity_));
	+ static_cast<int64_t>(std::round(macroInstsArea_ * targetDensity_));

	void Instance::dbSetPlacementStatus(odb::dbPlacementStatus ps)
	void Instance::dbSetPlacementStatus(const odb::dbPlacementStatus& ps)

Conversation

kamilrakoczy commented Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jul 8, 2024

Choose a reason for hiding this comment

Uh oh!

maliberty commented Jul 8, 2024

Uh oh!

kamilrakoczy commented Jul 9, 2024

Uh oh!

maliberty commented Jul 9, 2024

Uh oh!

maliberty commented Jul 9, 2024

Uh oh!

QuantamHD commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maliberty commented Jul 9, 2024

Uh oh!

QuantamHD commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kamilrakoczy commented Jul 18, 2024

Uh oh!

maliberty commented Jul 18, 2024

Uh oh!

kamilrakoczy commented Jul 19, 2024

Uh oh!

maliberty commented Jul 19, 2024

Uh oh!

maliberty commented Oct 14, 2024

Uh oh!

jbylicki commented Jan 7, 2025

Uh oh!

jbylicki commented Jan 9, 2025

Uh oh!

maliberty commented Jan 9, 2025

Uh oh!

jbylicki commented Jan 9, 2025

Uh oh!

maliberty commented Jan 9, 2025

Uh oh!

jbylicki commented Jan 9, 2025

Uh oh!

jbylicki commented Feb 12, 2025

Uh oh!

maliberty commented Feb 12, 2025

Uh oh!

github-actions Bot left a comment

kamilrakoczy commented Jul 8, 2024 •

edited

Loading

QuantamHD commented Jul 9, 2024 •

edited

Loading

QuantamHD commented Jul 9, 2024 •

edited

Loading

ApeachM commented May 6, 2026 •

edited

Loading