The following code
int size1 = 400;
int size2 = 2100000;
array HL = constant(0.f, size1, size1, f32);
for (int kk = 0; kk < 2; kk++) {
array LH = constant(0.f, size1 * size2, 1, f32);
LH = moddims(LH, size1, size2);
HL = matmulNT(LH, LH);
}
throws
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -4
terminate called after throwing an instance of 'af::exception'
what(): ArrayFire Exception (Internal error:998):
In function opencl::Array opencl::matmul(const opencl::Array&, const opencl::Array&, af_mat_prop, af_mat_prop) [with T = float]
In file src/backend/opencl/blas.cpp:120
CLBlast Error (-4): CL_MEM_OBJECT_ALLOCATION_FAILURE
on OpenCL. If I remove the loop, everything works fine. Also, the above code runs fine (and uses the correct amount of memory) on CUDA backend.
ArrayFire v3.7.0 (OpenCL, 64-bit Linux, build d25ab30)
[0] NVIDIA: Tesla P100-PCIE-16GB, 16280 MB
-1- NVIDIA: Quadro K620, 2001 MB
-2- INTEL: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 257831 MB
The following code
throws
CLBlast: OpenCL error: clEnqueueNDRangeKernel: -4
terminate called after throwing an instance of 'af::exception'
what(): ArrayFire Exception (Internal error:998):
In function opencl::Array opencl::matmul(const opencl::Array&, const opencl::Array&, af_mat_prop, af_mat_prop) [with T = float]
In file src/backend/opencl/blas.cpp:120
CLBlast Error (-4): CL_MEM_OBJECT_ALLOCATION_FAILURE
on OpenCL. If I remove the loop, everything works fine. Also, the above code runs fine (and uses the correct amount of memory) on CUDA backend.
ArrayFire v3.7.0 (OpenCL, 64-bit Linux, build d25ab30)
[0] NVIDIA: Tesla P100-PCIE-16GB, 16280 MB
-1- NVIDIA: Quadro K620, 2001 MB
-2- INTEL: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz, 257831 MB