Message ID | 20240904153038.23054-1-michaelgur@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce mlx5 Memory Scheme ODP | expand |
在 2024/9/4 23:30, Michael Guralnik 写道: > This series introduces a new ODP scheme in mlx5 where the FW takes the > responsibility of parsing and providing page fault data to the driver > to handle the fault. > As opposed to the current ODP transport scheme where the driver is > responsible for reading and parsing work queues and querying mkeys to > acquire needed info to handle the page fault. > > The new scheme allows driver to support ODP over Devx QPs where driver > is not able to access the QP buffers, owned by the user application, > to read the work queue requests. > Furthermore, the new scheme allows support for ODP with new indirect > MKEY types as the driver doesn't need to query or parse indirect mkeys > in this scheme. > > The driver will enable the new scheme on devices that have the relevant > capabilities. Otherwise, transport scheme ODP will be the default. > > The move to memory scheme ODP is transparent to existing ODP > applications and no change is needed. > New application that want to take advantage of the new functionality > should query which scheme is active and it's capabilities using Devx. On-Demand-Paging (ODP) is a technique to alleviate much of the shortcomings of memory registration. Applications no longer need to pin down the underlying physical pages of the address space, and track the validity of the mappings. Rather, the HCA requests the latest translations from the OS when pages are not present, and the OS invalidates translations which are no longer valid due to either non-present pages or mapping changes. As such, it seems that it can save memory via not pinning down the underlying physical pages of the address space, and track the validity of the mappings. What is the difference on the performance with/without ODP enabled? And about memory usage, is there any test result about this? And ODP can be used mlx5 IB device? Or ODP can only be used in mlx5 RoCEv2 device? Thanks, Zhu Yanjun > > Michael Guralnik (8): > net/mlx5: Expand mkey page size to support 6 bits > net/mlx5: Expose HW bits for Memory scheme ODP > RDMA/mlx5: Add new ODP memory scheme eqe format > RDMA/mlx5: Enforce umem boundaries for explicit ODP page faults > RDMA/mlx5: Split ODP mkey search logic > RDMA/mlx5: Add handling for memory scheme page fault events > RDMA/mlx5: Add implicit MR handling to ODP memory scheme > net/mlx5: Handle memory scheme ODP capabilities > > drivers/infiniband/hw/mlx5/mlx5_ib.h | 17 +- > drivers/infiniband/hw/mlx5/mr.c | 10 +- > drivers/infiniband/hw/mlx5/odp.c | 400 ++++++++++++++---- > .../net/ethernet/mellanox/mlx5/core/main.c | 54 ++- > include/linux/mlx5/device.h | 30 +- > include/linux/mlx5/mlx5_ifc.h | 64 ++- > 6 files changed, 449 insertions(+), 126 deletions(-) >
On 06/09/2024 08:35, Zhu Yanjun wrote: > > As such, it seems that it can save memory via not pinning down the > underlying physical pages of the address space, and track the validity > of the mappings. > > What is the difference on the performance with/without ODP enabled? And > about memory usage, is there any test result about this? > > And ODP can be used mlx5 IB device? Or ODP can only be used in mlx5 > RoCEv2 device? > The performance while using ODP is highly dependent on many factors that dictate how many page faults the kernel will have to deal with. Each page fault will introduce a latency hit. Both the examples in rdma_core (e.g ibv_rc_pingpong) and the perftest (e.g. ib_write_bw) support running with ODP to test this. ODP can be used in both IB and RoCE. Michael > Thanks, > Zhu Yanjun >
在 2024/9/8 14:18, Michael Guralnik 写道: > > On 06/09/2024 08:35, Zhu Yanjun wrote: >> >> As such, it seems that it can save memory via not pinning down the >> underlying physical pages of the address space, and track the validity >> of the mappings. >> >> What is the difference on the performance with/without ODP enabled? And >> about memory usage, is there any test result about this? >> >> And ODP can be used mlx5 IB device? Or ODP can only be used in mlx5 >> RoCEv2 device? >> > The performance while using ODP is highly dependent on many factors that > dictate how many page faults the kernel will have to deal with. > Each page fault will introduce a latency hit. > > Both the examples in rdma_core (e.g ibv_rc_pingpong) and the perftest > (e.g. ib_write_bw) support running with ODP to test this. Thanks a lot. I have developed ODP for other RDMA devices. From my tests, it seems that with ODP, the system memory is needed less than without ODP. From your descriptions, it seems that the latency of RDMA will increase if I get you correctly. If others (for example, bandwidth) remain unchanged, the tradeoff should be between memory and latency. Best Regards, Zhu Yanjun > > ODP can be used in both IB and RoCE. > > > Michael > > >> Thanks, >> Zhu Yanjun >>