Skip to content

CUDA-Aware MPI with device<>()-pointer needs af::sync() #1510

@mricherzhagen

Description

@mricherzhagen

I tried to use CUDA-Aware MPI together with ArrayFire and encountererd a problem: my code produced different results when using CUDA-Aware MPI in comparison to when the data is copied to the host memory and transfered using normal MPI communication.

When i tried to investigate this i wanted to compare the data that was send over CUDA-MPI with the data that was send using normal MPI, so i tried to copy the data and transfer it in both ways to compare what was communicated.
Using .host<>() or cudaMemcpy to copy the data before using the CUDA-Aware MPI call made the problem disappear and the correct result was calculated.

From this i figured it might be caused by the asynchronous style of ArrayFire and GPU programming in general and that a computation might not be finished, when CUDA-Aware MPI grabs the data from the GPUs memory.
Adding a af::sync() before the MPI call solved the problem.

This issue is kind of similiar to #1316 but more special, so you have to think about if you want to add the af::sync() call to the device<>() method. I don't know if you can reproduce this problem with ArrayFire alone (maybe you can?) and you probably don't want device<>() to be blocking.

So if you don't fix it a future reader might at least find this and be aware of the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions