Under circumstances that I do not fully understand, the CUDA backend can freeze when executing a reduce by key operation. I isolated the issue with the values contained in the key array, and it seems linked to the number of consecutive equal keys.
Under OpenCL, the function does not exhibit this behavior.
Description
I use the release AF binaries v3.7.1, with the Cuda backend, tested on two computers with Windows and Linux.
The issue happened every time at exactly the same spot in my program.
At first I saved the key and value arrays to files in order to test in a separate program, and it does act the same.
I then tried to replicate the issue by constructing array, and got the same issue when having two long sequences of equal keys.
It give no error log, no exception, the only information I have is that it freeze in the sumByKey operation (it behave the same with all ___ByKey operations).
Reproducible Code
Here's the test program that i came up with. Under OpenCL, it run flawlessly, but under Cuda it freeze each time at i=73.
#include <arrayfire.h>
#include <iostream>
int main(int argc, char *argv[]) {
int N = 1280*1280;
int count = 200;
try{
af::sync();
af::array val = af::randu(N);
af::array key = af::range(af::dim4(N), 0, af::dtype::s32);
af::array res1, res2;
for(int i = 0; i < count; ++i) {
std::cout << i << " consecutive key : ";
key(i) = 0;
key(count+i) = 1;
af::sumByKey(res1, res2, key, val);
res2.eval();
res1.eval();
af::sync();
std::cout << "Ok !" << std::endl;
}
std::cout << "Finished" << std::endl;
} catch (af::exception& e) {
std::cout << e.what() << std::endl;
return -1;
}
return 0;
}
System Information
ArrayFire version 3.7.1
Intel Core i7-9750H, 16Go RAM, GTX1650 4Go
Cuda info : https://pastebin.com/kLGvdUA0
Output of nvidia-smi :
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 446.14 Driver Version: 446.14 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 6W / N/A | 132MiB / 4096MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Under circumstances that I do not fully understand, the CUDA backend can freeze when executing a reduce by key operation. I isolated the issue with the values contained in the key array, and it seems linked to the number of consecutive equal keys.
Under OpenCL, the function does not exhibit this behavior.
Description
I use the release AF binaries v3.7.1, with the Cuda backend, tested on two computers with Windows and Linux.
The issue happened every time at exactly the same spot in my program.
At first I saved the key and value arrays to files in order to test in a separate program, and it does act the same.
I then tried to replicate the issue by constructing array, and got the same issue when having two long sequences of equal keys.
It give no error log, no exception, the only information I have is that it freeze in the sumByKey operation (it behave the same with all ___ByKey operations).
Reproducible Code
Here's the test program that i came up with. Under OpenCL, it run flawlessly, but under Cuda it freeze each time at i=73.
System Information
ArrayFire version 3.7.1
Intel Core i7-9750H, 16Go RAM, GTX1650 4Go
Cuda info : https://pastebin.com/kLGvdUA0
Output of nvidia-smi :