mbox series

[v2,00/17] RDMA: Improve use of umem in DMA drivers

Message ID 0-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com (mailing list archive)
Headers show
Series RDMA: Improve use of umem in DMA drivers | expand

Message

Jason Gunthorpe Sept. 4, 2020, 10:41 p.m. UTC
Most RDMA drivers rely on a linear table of DMA addresses organized in
some device specific page size.

For a while now the core code has had the rdma_for_each_block() SG
iterator to help break a umem into DMA blocks for use in the device lists.

Improve on this by adding rdma_umem_for_each_dma_block(),
ib_umem_dma_offset() and ib_umem_num_dma_blocks().

Replace open codings, or calls to fixed PAGE_SIZE APIs, in most of the
drivers with one of the above APIs.

Get rid of the really weird and duplicative ib_umem_page_count().

Fix two problems with ib_umem_find_best_pgsz(), and several problems
related to computing the wrong DMA list length if IOVA != umem->address.

At this point many of the driver have a clear path to call
ib_umem_find_best_pgsz() and replace hardcoded PAGE_SIZE or PAGE_SHIFT
values when constructing their DMA lists.

This is the first series in an effort to modernize the umem usage in all
the DMA drivers.

v1: https://lore.kernel.org/r/0-v1-00f59ce24f1f+19f50-umem_1_jgg@nvidia.com
v2:
 - Fix ib_umem_find_best_pgsz() to use IOVA not umem->addr
 - Fix ib_umem_num_dma_blocks() to use IOVA not umem->addr
 - Two new patches to remove wrong open coded versions of
   ib_umem_num_dma_blocks() from EFA and i40iw
 - Redo the mlx4 ib_umem_num_dma_blocks() to do less and be safer
   until the whole thing can be moved to ib_umem_find_best_pgsz()
 - Two new patches to delete calls to ib_umem_offset() in qedr and
   ocrdma

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (17):
  RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page
    boundary
  RDMA/umem: Prevent small pages from being returned by
    ib_umem_find_best_pgsz()
  RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()
  RDMA/umem: Add rdma_umem_for_each_dma_block()
  RDMA/umem: Replace for_each_sg_dma_page with
    rdma_umem_for_each_dma_block
  RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
  RDMA/efa: Use ib_umem_num_dma_pages()
  RDMA/i40iw: Use ib_umem_num_dma_pages()
  RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding
  RDMA/qedr: Use ib_umem_num_dma_blocks() instead of
    ib_umem_page_count()
  RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages()
  RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding
  RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of
    ib_umem_page_count()
  RDMA/pvrdma: Use ib_umem_num_dma_blocks() instead of
    ib_umem_page_count()
  RDMA/mlx4: Use ib_umem_num_dma_blocks()
  RDMA/qedr: Remove fbo and zbva from the MR
  RDMA/ocrdma: Remove fbo from MR

 .clang-format                                 |  1 +
 drivers/infiniband/core/umem.c                | 45 +++++++-----
 drivers/infiniband/hw/bnxt_re/ib_verbs.c      | 72 +++++++------------
 drivers/infiniband/hw/cxgb4/mem.c             |  8 +--
 drivers/infiniband/hw/efa/efa_verbs.c         |  9 ++-
 drivers/infiniband/hw/hns/hns_roce_alloc.c    |  3 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c       | 49 +++++--------
 drivers/infiniband/hw/i40iw/i40iw_verbs.c     | 13 +---
 drivers/infiniband/hw/mlx4/cq.c               |  1 -
 drivers/infiniband/hw/mlx4/mr.c               |  5 +-
 drivers/infiniband/hw/mlx4/qp.c               |  2 -
 drivers/infiniband/hw/mlx4/srq.c              |  5 +-
 drivers/infiniband/hw/mlx5/mem.c              |  4 +-
 drivers/infiniband/hw/mthca/mthca_provider.c  |  8 +--
 drivers/infiniband/hw/ocrdma/ocrdma.h         |  1 -
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c      |  5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   | 25 +++----
 drivers/infiniband/hw/qedr/verbs.c            | 52 +++++---------
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |  2 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_misc.c    |  9 ++-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c  |  2 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c  |  6 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_rdma.c    | 12 +---
 include/linux/qed/qed_rdma_if.h               |  2 -
 include/rdma/ib_umem.h                        | 37 ++++++++--
 include/rdma/ib_verbs.h                       | 24 -------
 27 files changed, 170 insertions(+), 234 deletions(-)

Comments

Jason Gunthorpe Sept. 9, 2020, 6:38 p.m. UTC | #1
On Fri, Sep 04, 2020 at 07:41:41PM -0300, Jason Gunthorpe wrote:
> Most RDMA drivers rely on a linear table of DMA addresses organized in
> some device specific page size.
> 
> For a while now the core code has had the rdma_for_each_block() SG
> iterator to help break a umem into DMA blocks for use in the device lists.
> 
> Improve on this by adding rdma_umem_for_each_dma_block(),
> ib_umem_dma_offset() and ib_umem_num_dma_blocks().
> 
> Replace open codings, or calls to fixed PAGE_SIZE APIs, in most of the
> drivers with one of the above APIs.
> 
> Get rid of the really weird and duplicative ib_umem_page_count().
> 
> Fix two problems with ib_umem_find_best_pgsz(), and several problems
> related to computing the wrong DMA list length if IOVA != umem->address.
> 
> At this point many of the driver have a clear path to call
> ib_umem_find_best_pgsz() and replace hardcoded PAGE_SIZE or PAGE_SHIFT
> values when constructing their DMA lists.
> 
> This is the first series in an effort to modernize the umem usage in all
> the DMA drivers.
> 
> v1: https://lore.kernel.org/r/0-v1-00f59ce24f1f+19f50-umem_1_jgg@nvidia.com
> v2:
>  - Fix ib_umem_find_best_pgsz() to use IOVA not umem->addr
>  - Fix ib_umem_num_dma_blocks() to use IOVA not umem->addr
>  - Two new patches to remove wrong open coded versions of
>    ib_umem_num_dma_blocks() from EFA and i40iw
>  - Redo the mlx4 ib_umem_num_dma_blocks() to do less and be safer
>    until the whole thing can be moved to ib_umem_find_best_pgsz()
>  - Two new patches to delete calls to ib_umem_offset() in qedr and
>    ocrdma
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason Gunthorpe (17):
>   RDMA/umem: Fix ib_umem_find_best_pgsz() for mappings that cross a page
>     boundary
>   RDMA/umem: Prevent small pages from being returned by
>     ib_umem_find_best_pgsz()
>   RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()
>   RDMA/umem: Add rdma_umem_for_each_dma_block()
>   RDMA/umem: Replace for_each_sg_dma_page with
>     rdma_umem_for_each_dma_block
>   RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()
>   RDMA/efa: Use ib_umem_num_dma_pages()
>   RDMA/i40iw: Use ib_umem_num_dma_pages()
>   RDMA/qedr: Use rdma_umem_for_each_dma_block() instead of open-coding
>   RDMA/qedr: Use ib_umem_num_dma_blocks() instead of
>     ib_umem_page_count()
>   RDMA/bnxt: Do not use ib_umem_page_count() or ib_umem_num_pages()
>   RDMA/hns: Use ib_umem_num_dma_blocks() instead of opencoding
>   RDMA/ocrdma: Use ib_umem_num_dma_blocks() instead of
>     ib_umem_page_count()
>   RDMA/pvrdma: Use ib_umem_num_dma_blocks() instead of
>     ib_umem_page_count()
>   RDMA/mlx4: Use ib_umem_num_dma_blocks()
>   RDMA/qedr: Remove fbo and zbva from the MR
>   RDMA/ocrdma: Remove fbo from MR

Applied to for-next with Leon's note. Thanks everyone

Jason