CUDA-Aware MPI with `device<>()`-pointer needs `af::sync()`

I tried to use CUDA-Aware MPI together with ArrayFire and encountererd a problem: my code produced different results when using CUDA-Aware MPI in comparison to when the data is copied to the host memory and transfered using normal MPI communication.

When i tried to investigate this i wanted to compare the data that was send over CUDA-MPI with the data that was send using normal MPI, so i tried  to copy the data and transfer it in both ways to compare what was communicated.
Using `.host<>()` or `cudaMemcpy` to copy the data before using the CUDA-Aware MPI call made the problem disappear and the correct result was calculated.

From this i figured it might be caused by the asynchronous style of ArrayFire and GPU programming in general and that a computation might not be finished, when CUDA-Aware MPI grabs the data from the GPUs memory. 
Adding a `af::sync()` before the MPI call solved the problem.

This issue is kind of similiar to #1316 but more special, so you have to think about if you want to add the `af::sync()` call to the `device<>()` method. I don't know if you can reproduce this problem with ArrayFire alone (maybe you can?) and you probably don't want `device<>()` to be blocking.

So if you don't fix it a future reader might at least find this and be aware of the problem.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA-Aware MPI with `device<>()`-pointer needs `af::sync()` #1510

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CUDA-Aware MPI with device<>()-pointer needs af::sync() #1510

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

CUDA-Aware MPI with `device<>()`-pointer needs `af::sync()` #1510