mbox series

[v7,00/17] Provide a new two step DMA mapping API

Message ID cover.1738765879.git.leonro@nvidia.com (mailing list archive)
Headers show
Series Provide a new two step DMA mapping API | expand

Message

Leon Romanovsky Feb. 5, 2025, 2:40 p.m. UTC
From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v7:
 * Rebased to v6.14-rc1
v6: https://lore.kernel.org/all/cover.1737106761.git.leon@kernel.org
 * Changed internal __size variable to u64 to properly set private flag
   in most significant bit.
 * Added comment about why we check DMA_IOVA_USE_SWIOTLB
 * Break unlink loop if phys is NULL, condition which we shouldn't get.
v5: https://lore.kernel.org/all/cover.1734436840.git.leon@kernel.org
 * Trimmed long lines in all patches.
 * Squashed "dma-mapping: Add check if IOVA can be used" into
   "dma: Provide an interface to allow allocate IOVA" patch.
 * Added tags from Christoph and Will.
 * Fixed spelling/grammar errors.
 * Change title from "dma: Provide an  ..." to be "dma-mapping: Provide
 * an ...".
 * Slightly changed hmm patch to set sticky flags in one place.
v4: https://lore.kernel.org/all/cover.1733398913.git.leon@kernel.org
 * Added extra patch to add kernel-doc for iommu_unmap and
 * iommu_unmap_fast
 * Rebased to v6.13-rc1
 * Added Will's tags
v3: https://lore.kernel.org/all/cover.1731244445.git.leon@kernel.org
 * Added DMA_ATTR_SKIP_CPU_SYNC to p2p pages in HMM.
 * Fixed error unwind if dma_iova_sync fails in HMM.
 * Clear all PFN flags which were set in map to make code.
   more clean, the callers anyway cleaned them.
 * Generalize sticky PFN flags logic in HMM.
 * Removed not-needed #ifdef-#endif section.
v2: https://lore.kernel.org/all/cover.1730892663.git.leon@kernel.org
 * Fixed docs file as Randy suggested
 * Fixed releases of memory in HMM path. It was allocated with kv..
   variants but released with kfree instead of kvfree.
 * Slightly changed commit message in VFIO patch.
v1: https://lore.kernel.org/all/cover.1730298502.git.leon@kernel.org
 * Squashed two VFIO patches into one
 * Added Acked-by/Reviewed-by tags
 * Fix docs spelling errors
 * Simplified dma_iova_sync() API
 * Added extra check in dma_iova_destroy() if mapped size to make code
 * more clear
 * Fixed checkpatch warnings in p2p patch
 * Changed implementation of VFIO mlx5 mlx5vf_add_migration_pages() to
   be more general
 * Reduced the number of changes in VFIO patch
v0: https://lore.kernel.org/all/cover.1730037276.git.leon@kernel.org

----------------------------------------------------------------------------
 No changes in checks, documentation and naming as no suggestion was
 given. Everything like this can be improved in followup patches.
----------------------------------------------------------------------------
 LWN coverage:
Dancing the DMA two-step - https://lwn.net/Articles/997563/
----------------------------------------------------------------------------

Currently the only efficient way to map a complex memory description through
the DMA API is by using the scatterlist APIs. The SG APIs are unique in that
they efficiently combine the two fundamental operations of sizing and allocating
a large IOVA window from the IOMMU and processing all the per-address
swiotlb/flushing/p2p/map details.

This uniqueness has been a long standing pain point as the scatterlist API
is mandatory, but expensive to use. It prevents any kind of optimization or
feature improvement (such as avoiding struct page for P2P) due to the
impossibility of improving the scatterlist.

Several approaches have been explored to expand the DMA API with additional
scatterlist-like structures (BIO, rlist), instead split up the DMA API
to allow callers to bring their own data structure.

The API is split up into parts:
 - Allocate IOVA space:
    To do any pre-allocation required. This is done based on the caller
    supplying some details about how much IOMMU address space it would need
    in worst case.
 - Map and unmap relevant structures to pre-allocated IOVA space:
    Perform the actual mapping into the pre-allocated IOVA. This is very
    similar to dma_map_page().

In this and the next series [1], examples of three different users are converted
to the new API to show the benefits and its versatility. Each user has a unique
flow:
 1. RDMA ODP is an example of "SVA mirroring" using HMM that needs to
    dynamically map/unmap large numbers of single pages. This becomes
    significantly faster in the IOMMU case as the map/unmap is now just
    a page table walk, the IOVA allocation is pre-computed once. Significant
    amounts of memory are saved as there is no longer a need to store the
    dma_addr_t of each page.
 2. VFIO PCI live migration code is building a very large "page list"
    for the device. Instead of allocating a scatter list entry per allocated
    page it can just allocate an array of 'struct page *', saving a large
    amount of memory.
 3. NVMe PCI demonstrates how a BIO can be converted to a HW scatter
    list without having to allocate then populate an intermediate SG table.

To make the use of the new API easier, HMM and block subsystems are extended
to hide the optimization details from the caller. Among these optimizations:
 * Memory reduction as in most real use cases there is no need to store mapped
   DMA addresses and unmap them.
 * Reducing the function call overhead by removing the need to call function
   pointers and use direct calls instead.

This step is first along a path to provide alternatives to scatterlist and
solve some of the abuses and design mistakes.

Thanks

Christoph Hellwig (6):
  PCI/P2PDMA: Refactor the p2pdma mapping helpers
  dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h
  iommu: generalize the batched sync after map interface
  iommu/dma: Factor out a iommu_dma_map_swiotlb helper
  dma-mapping: add a dma_need_unmap helper
  docs: core-api: document the IOVA-based API

Leon Romanovsky (11):
  iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast
  dma-mapping: Provide an interface to allow allocate IOVA
  dma-mapping: Implement link/unlink ranges API
  mm/hmm: let users to tag specific PFN with DMA mapped bit
  mm/hmm: provide generic DMA managing logic
  RDMA/umem: Store ODP access mask information in PFN
  RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page
    linkage
  RDMA/umem: Separate implicit ODP initialization from explicit ODP
  vfio/mlx5: Explicitly use number of pages instead of allocated length
  vfio/mlx5: Rewrite create mkey flow to allow better code reuse
  vfio/mlx5: Enable the DMA link API

 Documentation/core-api/dma-api.rst   |  70 ++++
 drivers/infiniband/core/umem_odp.c   | 250 +++++---------
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  12 +-
 drivers/infiniband/hw/mlx5/odp.c     |  65 ++--
 drivers/infiniband/hw/mlx5/umr.c     |  12 +-
 drivers/iommu/dma-iommu.c            | 468 +++++++++++++++++++++++----
 drivers/iommu/iommu.c                |  84 ++---
 drivers/pci/p2pdma.c                 |  38 +--
 drivers/vfio/pci/mlx5/cmd.c          | 375 +++++++++++----------
 drivers/vfio/pci/mlx5/cmd.h          |  35 +-
 drivers/vfio/pci/mlx5/main.c         |  87 +++--
 include/linux/dma-map-ops.h          |  54 ----
 include/linux/dma-mapping.h          |  85 +++++
 include/linux/hmm-dma.h              |  33 ++
 include/linux/hmm.h                  |  21 ++
 include/linux/iommu.h                |   4 +
 include/linux/pci-p2pdma.h           |  84 +++++
 include/rdma/ib_umem_odp.h           |  25 +-
 kernel/dma/direct.c                  |  44 +--
 kernel/dma/mapping.c                 |  18 ++
 mm/hmm.c                             | 264 +++++++++++++--
 21 files changed, 1435 insertions(+), 693 deletions(-)
 create mode 100644 include/linux/hmm-dma.h

Comments

Leon Romanovsky Feb. 20, 2025, 12:48 p.m. UTC | #1
On Wed, Feb 05, 2025 at 04:40:20PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Changelog:
> v7:
>  * Rebased to v6.14-rc1

<...>

> Christoph Hellwig (6):
>   PCI/P2PDMA: Refactor the p2pdma mapping helpers
>   dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h
>   iommu: generalize the batched sync after map interface
>   iommu/dma: Factor out a iommu_dma_map_swiotlb helper
>   dma-mapping: add a dma_need_unmap helper
>   docs: core-api: document the IOVA-based API
> 
> Leon Romanovsky (11):
>   iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast
>   dma-mapping: Provide an interface to allow allocate IOVA
>   dma-mapping: Implement link/unlink ranges API
>   mm/hmm: let users to tag specific PFN with DMA mapped bit
>   mm/hmm: provide generic DMA managing logic
>   RDMA/umem: Store ODP access mask information in PFN
>   RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page
>     linkage
>   RDMA/umem: Separate implicit ODP initialization from explicit ODP
>   vfio/mlx5: Explicitly use number of pages instead of allocated length
>   vfio/mlx5: Rewrite create mkey flow to allow better code reuse
>   vfio/mlx5: Enable the DMA link API
> 
>  Documentation/core-api/dma-api.rst   |  70 ++++
>  drivers/infiniband/core/umem_odp.c   | 250 +++++---------
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  12 +-
>  drivers/infiniband/hw/mlx5/odp.c     |  65 ++--
>  drivers/infiniband/hw/mlx5/umr.c     |  12 +-
>  drivers/iommu/dma-iommu.c            | 468 +++++++++++++++++++++++----
>  drivers/iommu/iommu.c                |  84 ++---
>  drivers/pci/p2pdma.c                 |  38 +--
>  drivers/vfio/pci/mlx5/cmd.c          | 375 +++++++++++----------
>  drivers/vfio/pci/mlx5/cmd.h          |  35 +-
>  drivers/vfio/pci/mlx5/main.c         |  87 +++--
>  include/linux/dma-map-ops.h          |  54 ----
>  include/linux/dma-mapping.h          |  85 +++++
>  include/linux/hmm-dma.h              |  33 ++
>  include/linux/hmm.h                  |  21 ++
>  include/linux/iommu.h                |   4 +
>  include/linux/pci-p2pdma.h           |  84 +++++
>  include/rdma/ib_umem_odp.h           |  25 +-
>  kernel/dma/direct.c                  |  44 +--
>  kernel/dma/mapping.c                 |  18 ++
>  mm/hmm.c                             | 264 +++++++++++++--
>  21 files changed, 1435 insertions(+), 693 deletions(-)
>  create mode 100644 include/linux/hmm-dma.h

Kind reminder.

Thanks

> 
> -- 
> 2.48.1
> 
>
Robin Murphy Feb. 28, 2025, 7:54 p.m. UTC | #2
On 20/02/2025 12:48 pm, Leon Romanovsky wrote:
> On Wed, Feb 05, 2025 at 04:40:20PM +0200, Leon Romanovsky wrote:
>> From: Leon Romanovsky <leonro@nvidia.com>
>>
>> Changelog:
>> v7:
>>   * Rebased to v6.14-rc1
> 
> <...>
> 
>> Christoph Hellwig (6):
>>    PCI/P2PDMA: Refactor the p2pdma mapping helpers
>>    dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h
>>    iommu: generalize the batched sync after map interface
>>    iommu/dma: Factor out a iommu_dma_map_swiotlb helper
>>    dma-mapping: add a dma_need_unmap helper
>>    docs: core-api: document the IOVA-based API
>>
>> Leon Romanovsky (11):
>>    iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast
>>    dma-mapping: Provide an interface to allow allocate IOVA
>>    dma-mapping: Implement link/unlink ranges API
>>    mm/hmm: let users to tag specific PFN with DMA mapped bit
>>    mm/hmm: provide generic DMA managing logic
>>    RDMA/umem: Store ODP access mask information in PFN
>>    RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page
>>      linkage
>>    RDMA/umem: Separate implicit ODP initialization from explicit ODP
>>    vfio/mlx5: Explicitly use number of pages instead of allocated length
>>    vfio/mlx5: Rewrite create mkey flow to allow better code reuse
>>    vfio/mlx5: Enable the DMA link API
>>
>>   Documentation/core-api/dma-api.rst   |  70 ++++
>   drivers/infiniband/core/umem_odp.c   | 250 +++++---------
>>   drivers/infiniband/hw/mlx5/mlx5_ib.h |  12 +-
>>   drivers/infiniband/hw/mlx5/odp.c     |  65 ++--
>>   drivers/infiniband/hw/mlx5/umr.c     |  12 +-
>>   drivers/iommu/dma-iommu.c            | 468 +++++++++++++++++++++++----
>>   drivers/iommu/iommu.c                |  84 ++---
>>   drivers/pci/p2pdma.c                 |  38 +--
>>   drivers/vfio/pci/mlx5/cmd.c          | 375 +++++++++++----------
>>   drivers/vfio/pci/mlx5/cmd.h          |  35 +-
>>   drivers/vfio/pci/mlx5/main.c         |  87 +++--
>>   include/linux/dma-map-ops.h          |  54 ----
>>   include/linux/dma-mapping.h          |  85 +++++
>>   include/linux/hmm-dma.h              |  33 ++
>>   include/linux/hmm.h                  |  21 ++
>>   include/linux/iommu.h                |   4 +
>>   include/linux/pci-p2pdma.h           |  84 +++++
>>   include/rdma/ib_umem_odp.h           |  25 +-
>>   kernel/dma/direct.c                  |  44 +--
>>   kernel/dma/mapping.c                 |  18 ++
>>   mm/hmm.c                             | 264 +++++++++++++--
>>   21 files changed, 1435 insertions(+), 693 deletions(-)
>>   create mode 100644 include/linux/hmm-dma.h
> 
> Kind reminder.

...that you've simply reposted the same thing again? Without doing 
anything to address the bugs, inconsistencies, fundamental design flaws 
in claiming to be something it cannot possibly be, the egregious abuse 
of DMA_ATTR_SKIP_CPU_SYNC proudly highlighting how unfit-for-purpose the 
most basic part of the whole idea is, nor *still* the complete lack of 
any demonstrable justification of how callers who supposedly can't use 
the IOMMU API actually benefit from adding all the complexity of using 
the IOMMU API in a hat but also still the streaming DMA API as well?

Yeah, consider me reminded.



In case I need to make it any more explicit, NAK to this not-generic 
not-DMA-mapping API, until you can come up with either something which 
*can* actually work in any kind of vaguely generic manner as claimed, or 
instead settle on a reasonable special-case solution for justifiable 
special cases. Bikeshedding and rebasing through half a dozen versions, 
while ignoring fundamental issues I've been pointing out from the very 
beginning, has not somehow magically made this series mature and 
acceptable to merge.

Honestly, given certain other scenarios we may also end up having to 
deal with, if by the time everything broken is taken away, it were to 
end up stripped all the way back to something well-reasoned like:

"Some drivers want more control of their DMA buffer layout than the 
general-purpose IOVA allocator is able to provide though the DMA mapping 
APIs, but also would rather not have to deal with managing an entire 
IOMMU domain and address space, making MSIs work, etc. Expose 
iommu_dma_alloc_iova() and some trivial IOMMU API wrappers to allow 
drivers of coherent devices to claim regions of the default domain 
wherein they can manage their own mappings directly."

...I wouldn't necessarily disagree.

Thanks,
Robin.
Leon Romanovsky March 2, 2025, 8:57 a.m. UTC | #3
On Fri, Feb 28, 2025 at 07:54:11PM +0000, Robin Murphy wrote:
> On 20/02/2025 12:48 pm, Leon Romanovsky wrote:
> > On Wed, Feb 05, 2025 at 04:40:20PM +0200, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > Changelog:
> > > v7:
> > >   * Rebased to v6.14-rc1
> > 
> > <...>
> > 
> > > Christoph Hellwig (6):
> > >    PCI/P2PDMA: Refactor the p2pdma mapping helpers
> > >    dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h
> > >    iommu: generalize the batched sync after map interface
> > >    iommu/dma: Factor out a iommu_dma_map_swiotlb helper
> > >    dma-mapping: add a dma_need_unmap helper
> > >    docs: core-api: document the IOVA-based API
> > > 
> > > Leon Romanovsky (11):
> > >    iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast
> > >    dma-mapping: Provide an interface to allow allocate IOVA
> > >    dma-mapping: Implement link/unlink ranges API
> > >    mm/hmm: let users to tag specific PFN with DMA mapped bit
> > >    mm/hmm: provide generic DMA managing logic
> > >    RDMA/umem: Store ODP access mask information in PFN
> > >    RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page
> > >      linkage
> > >    RDMA/umem: Separate implicit ODP initialization from explicit ODP
> > >    vfio/mlx5: Explicitly use number of pages instead of allocated length
> > >    vfio/mlx5: Rewrite create mkey flow to allow better code reuse
> > >    vfio/mlx5: Enable the DMA link API
> > > 
> > >   Documentation/core-api/dma-api.rst   |  70 ++++
> >   drivers/infiniband/core/umem_odp.c   | 250 +++++---------
> > >   drivers/infiniband/hw/mlx5/mlx5_ib.h |  12 +-
> > >   drivers/infiniband/hw/mlx5/odp.c     |  65 ++--
> > >   drivers/infiniband/hw/mlx5/umr.c     |  12 +-
> > >   drivers/iommu/dma-iommu.c            | 468 +++++++++++++++++++++++----
> > >   drivers/iommu/iommu.c                |  84 ++---
> > >   drivers/pci/p2pdma.c                 |  38 +--
> > >   drivers/vfio/pci/mlx5/cmd.c          | 375 +++++++++++----------
> > >   drivers/vfio/pci/mlx5/cmd.h          |  35 +-
> > >   drivers/vfio/pci/mlx5/main.c         |  87 +++--
> > >   include/linux/dma-map-ops.h          |  54 ----
> > >   include/linux/dma-mapping.h          |  85 +++++
> > >   include/linux/hmm-dma.h              |  33 ++
> > >   include/linux/hmm.h                  |  21 ++
> > >   include/linux/iommu.h                |   4 +
> > >   include/linux/pci-p2pdma.h           |  84 +++++
> > >   include/rdma/ib_umem_odp.h           |  25 +-
> > >   kernel/dma/direct.c                  |  44 +--
> > >   kernel/dma/mapping.c                 |  18 ++
> > >   mm/hmm.c                             | 264 +++++++++++++--
> > >   21 files changed, 1435 insertions(+), 693 deletions(-)
> > >   create mode 100644 include/linux/hmm-dma.h
> > 
> > Kind reminder.
> 
> ...that you've simply reposted the same thing again? Without doing anything
> to address the bugs, inconsistencies, fundamental design flaws in claiming
> to be something it cannot possibly be, the egregious abuse of
> DMA_ATTR_SKIP_CPU_SYNC proudly highlighting how unfit-for-purpose the most
> basic part of the whole idea is, nor *still* the complete lack of any
> demonstrable justification of how callers who supposedly can't use the IOMMU
> API actually benefit from adding all the complexity of using the IOMMU API
> in a hat but also still the streaming DMA API as well?

Can you please provide concrete list of "the bugs, inconsistencies, fundamental
design flaws", so we can address/fix them?

We are in v7 now and out of all postings you replied to v1 and v5 only with
followups from three of us (Christoph, Jason and me).

> 
> Yeah, consider me reminded.

Silence means agreement.

> 
> In case I need to make it any more explicit, NAK to this not-generic
> not-DMA-mapping API, until you can come up with either something which *can*
> actually work in any kind of vaguely generic manner as claimed, or instead
> settle on a reasonable special-case solution for justifiable special cases.
> Bikeshedding and rebasing through half a dozen versions, while ignoring
> fundamental issues I've been pointing out from the very beginning, has not
> somehow magically made this series mature and acceptable to merge.

You never responded to Christoph's answers, so please try your best and
be professional, write down the list of things you want to see handled
in next version and it will be done. It is impossible to guess what you
want if you are not saying it clearly.

The main issue which we are trying to solve "abuse of SG lists for
things without struct page", is not going to disappear by itself.

> 
> Honestly, given certain other scenarios we may also end up having to deal
> with, if by the time everything broken is taken away, it were to end up
> stripped all the way back to something well-reasoned like:
> 
> "Some drivers want more control of their DMA buffer layout than the
> general-purpose IOVA allocator is able to provide though the DMA mapping
> APIs, but also would rather not have to deal with managing an entire IOMMU
> domain and address space, making MSIs work, etc. Expose
> iommu_dma_alloc_iova() and some trivial IOMMU API wrappers to allow drivers
> of coherent devices to claim regions of the default domain wherein they can
> manage their own mappings directly."
> 
> ...I wouldn't necessarily disagree.

Something like that was done in first RFC version, but the overall
feeling was that it is layer violation with unclear path to support
swiotlb for NVMe.

Thanks

> 
> Thanks,
> Robin.
Marek Szyprowski March 12, 2025, 9:28 a.m. UTC | #4
Hi Robin

On 28.02.2025 20:54, Robin Murphy wrote:
> On 20/02/2025 12:48 pm, Leon Romanovsky wrote:
>> On Wed, Feb 05, 2025 at 04:40:20PM +0200, Leon Romanovsky wrote:
>>> From: Leon Romanovsky <leonro@nvidia.com>
>>>
>>> Changelog:
>>> v7:
>>>   * Rebased to v6.14-rc1
>>
>> <...>
>>
>>> Christoph Hellwig (6):
>>>    PCI/P2PDMA: Refactor the p2pdma mapping helpers
>>>    dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h
>>>    iommu: generalize the batched sync after map interface
>>>    iommu/dma: Factor out a iommu_dma_map_swiotlb helper
>>>    dma-mapping: add a dma_need_unmap helper
>>>    docs: core-api: document the IOVA-based API
>>>
>>> Leon Romanovsky (11):
>>>    iommu: add kernel-doc for iommu_unmap and iommu_unmap_fast
>>>    dma-mapping: Provide an interface to allow allocate IOVA
>>>    dma-mapping: Implement link/unlink ranges API
>>>    mm/hmm: let users to tag specific PFN with DMA mapped bit
>>>    mm/hmm: provide generic DMA managing logic
>>>    RDMA/umem: Store ODP access mask information in PFN
>>>    RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page
>>>      linkage
>>>    RDMA/umem: Separate implicit ODP initialization from explicit ODP
>>>    vfio/mlx5: Explicitly use number of pages instead of allocated 
>>> length
>>>    vfio/mlx5: Rewrite create mkey flow to allow better code reuse
>>>    vfio/mlx5: Enable the DMA link API
>>>
>>>   Documentation/core-api/dma-api.rst   |  70 ++++
>>   drivers/infiniband/core/umem_odp.c   | 250 +++++---------
>>>   drivers/infiniband/hw/mlx5/mlx5_ib.h |  12 +-
>>>   drivers/infiniband/hw/mlx5/odp.c     |  65 ++--
>>>   drivers/infiniband/hw/mlx5/umr.c     |  12 +-
>>>   drivers/iommu/dma-iommu.c            | 468 
>>> +++++++++++++++++++++++----
>>>   drivers/iommu/iommu.c                |  84 ++---
>>>   drivers/pci/p2pdma.c                 |  38 +--
>>>   drivers/vfio/pci/mlx5/cmd.c          | 375 +++++++++++----------
>>>   drivers/vfio/pci/mlx5/cmd.h          |  35 +-
>>>   drivers/vfio/pci/mlx5/main.c         |  87 +++--
>>>   include/linux/dma-map-ops.h          |  54 ----
>>>   include/linux/dma-mapping.h          |  85 +++++
>>>   include/linux/hmm-dma.h              |  33 ++
>>>   include/linux/hmm.h                  |  21 ++
>>>   include/linux/iommu.h                |   4 +
>>>   include/linux/pci-p2pdma.h           |  84 +++++
>>>   include/rdma/ib_umem_odp.h           |  25 +-
>>>   kernel/dma/direct.c                  |  44 +--
>>>   kernel/dma/mapping.c                 |  18 ++
>>>   mm/hmm.c                             | 264 +++++++++++++--
>>>   21 files changed, 1435 insertions(+), 693 deletions(-)
>>>   create mode 100644 include/linux/hmm-dma.h
>>
>> Kind reminder.
>
> ...that you've simply reposted the same thing again? Without doing 
> anything to address the bugs, inconsistencies, fundamental design 
> flaws in claiming to be something it cannot possibly be, the egregious 
> abuse of DMA_ATTR_SKIP_CPU_SYNC proudly highlighting how 
> unfit-for-purpose the most basic part of the whole idea is, nor 
> *still* the complete lack of any demonstrable justification of how 
> callers who supposedly can't use the IOMMU API actually benefit from 
> adding all the complexity of using the IOMMU API in a hat but also 
> still the streaming DMA API as well?
>
> Yeah, consider me reminded.
>
>
>
> In case I need to make it any more explicit, NAK to this not-generic 
> not-DMA-mapping API, until you can come up with either something which 
> *can* actually work in any kind of vaguely generic manner as claimed, 
> or instead settle on a reasonable special-case solution for 
> justifiable special cases. Bikeshedding and rebasing through half a 
> dozen versions, while ignoring fundamental issues I've been pointing 
> out from the very beginning, has not somehow magically made this 
> series mature and acceptable to merge.
>
> Honestly, given certain other scenarios we may also end up having to 
> deal with, if by the time everything broken is taken away, it were to 
> end up stripped all the way back to something well-reasoned like:
>
> "Some drivers want more control of their DMA buffer layout than the 
> general-purpose IOVA allocator is able to provide though the DMA 
> mapping APIs, but also would rather not have to deal with managing an 
> entire IOMMU domain and address space, making MSIs work, etc. Expose 
> iommu_dma_alloc_iova() and some trivial IOMMU API wrappers to allow 
> drivers of coherent devices to claim regions of the default domain 
> wherein they can manage their own mappings directly."
>
> ...I wouldn't necessarily disagree.


Well, this is definitely not a review I've expected. I admit that I 
wasn't involved in this proposal nor the discussion about it and I 
wasn't able to devote enough time for keeping myself up to date. Now 
I've tried to read all the required backlog and I must admit that this 
was quite demanding.

If You didn't like this design from the beginning, then please state 
that early instead of pointing random minor issues in the code. There 
have been plenty of time to discuss the overall approach if You think it 
was wrong. What do to now?

Removing the need for scatterlists was advertised as the main goal of 
this new API, but it looks that similar effects can be achieved with 
just iterating over the pages and calling page-based DMA API directly. 
Maybe I missed something. I still see some advantages in this DMA API 
extension, but I would also like to see the clear benefits from 
introducing it, like perf logs or other benchmark summary.


Best regards