Skip to content

Errors when CUDA-MPS is active #1459

@traktofon

Description

@traktofon

Hi,
when the NVIDIA CUDA Multi Process Service (MPS) is active, I encounter two problems:
a) the opencl backend doesn't work
b) the unified backend cannot invoke the cuda backend

To reproduce, run "nvidia-cuda-mps-control -d" as root, then test with the "examples/helloworld":

  • helloworld_cpu works
  • helloworld_cuda works
  • helloworld_opencl yields error:
~/af/build/helloworld> AF_PRINT_ERRORS=1 ./helloworld_opencl 
In function opencl::DeviceManager::DeviceManager()
In file src/backend/opencl/platform.cpp:329
OpenCL Error (-30): Invalid Value when calling clCreateContext

ArrayFire Exception (Internal error:998):
In function opencl::DeviceManager::DeviceManager()
In file src/backend/opencl/platform.cpp:329
OpenCL Error (-30): Invalid Value when calling clCreateContext

In function void af::setDevice(int)
In file src/api/cpp/device.cpp:91
terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Internal error:998):
In function opencl::DeviceManager::DeviceManager()
In file src/backend/opencl/platform.cpp:329
OpenCL Error (-30): Invalid Value when calling clCreateContext

In function void af::setDevice(int)
In file src/api/cpp/device.cpp:91
Aborted
  • helloworld_unified yields error:
~/af/build/helloworld> AF_PRINT_ERRORS=1 ./helloworld_unified 
In function cuda::DeviceManager::DeviceManager()
In file src/backend/cuda/platform.cpp:359
CUDA Error (2): out of memory


ArrayFire Exception (Device out of memory:101):
In function cuda::DeviceManager::DeviceManager()
In file src/backend/cuda/platform.cpp:359
CUDA Error (2): out of memory


In function void af::setDevice(int)
In file src/api/cpp/device.cpp:91
terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Device out of memory:101):
In function cuda::DeviceManager::DeviceManager()
In file src/backend/cuda/platform.cpp:359
CUDA Error (2): out of memory


In function void af::setDevice(int)
In file src/api/cpp/device.cpp:91
Aborted

If the cuda-mps service is not running, then all four backends work properly.

I also encountered problems with ArrayFire.jl, where even the cpu and cuda backends don't work if cuda-mps is running. Without cuda-mps, the backends work fine.

In theory, whether cuda-mps is running or not should be completely transparent to CUDA applications. On multi-user systems, and for MPI-parallelized programs, cuda-mps is beneficial, so it would be nice if ArrayFire could work properly with MPS.

Tested with ArrayFire-3.3.2, both binary distribution and compiled from source.
CUDA version is 7.5.
Nvidia driver is 352.63.

Regards,
Frank

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions