diff mbox series

[rc] RDMA: Handle the return code from dma_resv_wait_timeout() properly

Message ID 0-v1-d8f4e1fa84c8+17-rdma_dmabuf_fix_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series [rc] RDMA: Handle the return code from dma_resv_wait_timeout() properly | expand

Commit Message

Jason Gunthorpe Aug. 16, 2022, 2:03 p.m. UTC
ib_umem_dmabuf_map_pages() returns 0 on success and -ERRNO on failure.

dma_resv_wait_timeout() uses a different scheme:

 * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
 * greater than zero on success.

This results in ib_umem_dmabuf_map_pages() being non-functional as a
positive return will be understood to be an error by drivers.

Fixes: f30bceab16d1 ("RDMA: use dma_resv_wait() instead of extracting the fence")
Cc: stable@kernel.org
Tested-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/core/umem_dmabuf.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Oded, I assume the Habana driver will hit this as well - does this mean you
are not testing upstream kernels?


base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868

Comments

Leon Romanovsky Aug. 16, 2022, 2:14 p.m. UTC | #1
On Tue, Aug 16, 2022 at 11:03:20AM -0300, Jason Gunthorpe wrote:
> ib_umem_dmabuf_map_pages() returns 0 on success and -ERRNO on failure.
> 
> dma_resv_wait_timeout() uses a different scheme:
> 
>  * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
>  * greater than zero on success.
> 
> This results in ib_umem_dmabuf_map_pages() being non-functional as a
> positive return will be understood to be an error by drivers.
> 
> Fixes: f30bceab16d1 ("RDMA: use dma_resv_wait() instead of extracting the fence")
> Cc: stable@kernel.org
> Tested-by: Maor Gottlieb <maorg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/infiniband/core/umem_dmabuf.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 

Thanks, applied to -rc.
Oded Gabbay Aug. 17, 2022, 8:45 a.m. UTC | #2
On Tue, Aug 16, 2022 at 5:03 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> ib_umem_dmabuf_map_pages() returns 0 on success and -ERRNO on failure.
>
> dma_resv_wait_timeout() uses a different scheme:
>
>  * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or
>  * greater than zero on success.
>
> This results in ib_umem_dmabuf_map_pages() being non-functional as a
> positive return will be understood to be an error by drivers.
>
> Fixes: f30bceab16d1 ("RDMA: use dma_resv_wait() instead of extracting the fence")
> Cc: stable@kernel.org
> Tested-by: Maor Gottlieb <maorg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/infiniband/core/umem_dmabuf.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> Oded, I assume the Habana driver will hit this as well - does this mean you
> are not testing upstream kernels?
Thanks Jason for letting me know.

You are correct, we don't use upstream kernels.
We use a back-ported EFA driver for 5.15 which in that version,
ib_umem_dmabuf_map_pages() calls dma_resv_excl_fence().
So I guess that's why we didn't encounter this issue.

Thanks,
oded




>
> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> index fce80a4a5147cd..04c04e6d24c358 100644
> --- a/drivers/infiniband/core/umem_dmabuf.c
> +++ b/drivers/infiniband/core/umem_dmabuf.c
> @@ -18,6 +18,7 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
>         struct scatterlist *sg;
>         unsigned long start, end, cur = 0;
>         unsigned int nmap = 0;
> +       long ret;
>         int i;
>
>         dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> @@ -67,9 +68,14 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
>          * may be not up-to-date. Wait for the exporter to finish
>          * the migration.
>          */
> -       return dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv,
> +       ret = dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv,
>                                      DMA_RESV_USAGE_KERNEL,
>                                      false, MAX_SCHEDULE_TIMEOUT);
> +       if (ret < 0)
> +               return ret;
> +       if (ret == 0)
> +               return -ETIMEDOUT;
> +       return 0;
>  }
>  EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
>
>
> base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
> --
> 2.37.2
>
diff mbox series

Patch

diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
index fce80a4a5147cd..04c04e6d24c358 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -18,6 +18,7 @@  int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
 	struct scatterlist *sg;
 	unsigned long start, end, cur = 0;
 	unsigned int nmap = 0;
+	long ret;
 	int i;
 
 	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
@@ -67,9 +68,14 @@  int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
 	 * may be not up-to-date. Wait for the exporter to finish
 	 * the migration.
 	 */
-	return dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv,
+	ret = dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv,
 				     DMA_RESV_USAGE_KERNEL,
 				     false, MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+	if (ret == 0)
+		return -ETIMEDOUT;
+	return 0;
 }
 EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);