-
Notifications
You must be signed in to change notification settings - Fork 174
Set gpu tpb #736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Set gpu tpb #736
Changes from all commits
c150e0b
9b8ddd1
680fdda
af1e5cb
aaf5681
99c3b3c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,6 +41,7 @@ | |
| #include "quest/src/gpu/cuda_to_hip.hpp" | ||
| #endif | ||
|
|
||
| int numThreadsPerBlock = 128; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should give this a Further, given it's not accessed anywhere outside |
||
|
|
||
|
|
||
| /* | ||
|
|
@@ -330,6 +331,24 @@ qindex gpu_getMaxNumConcurrentThreads() { | |
| * ENVIRONMENT MANAGEMENT | ||
| */ | ||
|
|
||
| int gpu_getNumThreadsPerBlock() { | ||
| #if COMPILE_CUDA | ||
| return numThreadsPerBlock; | ||
| #else | ||
| error_gpuQueriedButGpuNotCompiled(); | ||
| return -1; | ||
| #endif | ||
| } | ||
|
|
||
| void gpu_setNumThreadsPerBlock(const int newThreadsPerBlock) { | ||
| #if COMPILE_CUDA | ||
| numThreadsPerBlock = newThreadsPerBlock; | ||
| #else | ||
| error_gpuQueriedButGpuNotCompiled(); | ||
| #endif | ||
| return; | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we permit users to call the corresponding API functions when GPU acceleration is not enabled, then these guards can be removed entirely. I think that's fair/natural, because we certainly shouldn't introduce an API difference between compiling but not running with GPU acceleration. I would also comment this exception. So this could become: int gpu_getNumThreadsPerBlock() {
// permitted even when GPU backend not compiled
return globlal_numThreadsPerBlock;
}
void gpu_setNumThreadsPerBlock(const int newThreadsPerBlock) {
// permitted even when GPU backend not compiled
global_numThreadsPerBlock = newThreadsPerBlock;
} |
||
|
|
||
|
|
||
| std::array<char,17> getBoundGpuUuid() { | ||
| #if COMPILE_CUDA | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,23 +42,19 @@ | |
| * THREAD MANAGEMENT | ||
| */ | ||
|
|
||
|
|
||
| const int NUM_THREADS_PER_BLOCK = 128; | ||
|
|
||
|
|
||
| __forceinline__ __device__ qindex getThreadInd() { | ||
| return blockIdx.x*blockDim.x + threadIdx.x; | ||
| } | ||
|
|
||
|
|
||
| __host__ qindex getNumBlocks(qindex numThreads) { | ||
| __host__ qindex getNumBlocks(qindex numThreads, const int numThreadsPerBlock) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove the |
||
|
|
||
| /// @todo | ||
| /// improve this with cudaOccupancyMaxPotentialBlockSize(), | ||
| /// making it function specific | ||
|
|
||
| // CUDA ceil | ||
| return ceil(numThreads / static_cast<qreal>(NUM_THREADS_PER_BLOCK)); | ||
| return ceil(numThreads / static_cast<qreal>(numThreadsPerBlock)); | ||
| } | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: validate this is a factor of 32 (and is positive, etc etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc to user: HIP warpsize is 64!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better alternative: add
gpu_isHipCompiled()ingpu_config.cpp, right undergpu_isCuQuantumCompiled(), as:Then we can validate explicitly that when GPU-accelerated and we're on HIP, arg must be a multiple of 64, else of 32. This means 32 is required even when not GPU-accelerated; so we make that error message: