I'm trying to implement a critical section in CUDA using atomic instructions, but I ran into some trouble. I have created the test program to show the problem:
#include <cuda_runtime.h>
#include <cutil_inline.h>
#include <stdio.h>
__global__ void k_testLocking(unsigned int* locks, int n) {
int id = threadIdx.x % n;
while (atomicExch(&(locks[id]), 1u) != 0u) {} //lock
//critical section would go here
atomicExch(&(locks[id]),0u); //unlock
}
int main(int argc, char** argv) {
//initialize the locks array on the GPU to (0...0)
unsigned int* locks;
unsigned int zeros[10]; for (int i = 0; i < 10; i++) {zeros[i] = 0u;}
cutilSafeCall(cudaMalloc((void**)&locks, sizeof(unsigned int)*10));
cutilSafeCall(cudaMemcpy(locks, zeros, sizeof(unsigned int)*10, cudaMemcpyHostToDevice));
//Run the kernel:
k_testLocking<<<dim3(1), dim3(256)>>>(locks, 10);
//Check the error messages:
cudaError_t error = cudaGetLastError();
cutilSafeCall(cudaFree(locks));
if (cudaSuccess != error) {
printf("error 1: CUDA ERROR (%d) {%s}
", error, cudaGetErrorString(error));
exit(-1);
}
return 0;
}
This code, unfortunately, hard freezes my machine for several seconds and finally exits, printing out the message:
fcudaSafeCall() Runtime API error in file <XXX.cu>, line XXX : the launch timed out and was terminated.
which means that one of those while loops is not returning, but it seems like this should work.
As a reminder atomicExch(unsigned int* address, unsigned int val)
atomically sets the value of the memory location stored in address to val
and returns the old
value. So the idea behind my locking mechanism is that it is initially 0u
, so one thread should get past the while
loop and all other threads should wait on the while
loop since they will read locks[id]
as 1u
. Then when the thread is done with the critical section, it resets the lock to 0u
so another thread can enter.
What am I missing?
By the way, I am compiling with:
nvcc -arch sm_11 -Ipath/to/cuda/C/common/inc XXX.cu
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…