Message ID | 20241009015903.801987-1-matsuda-daisuke@fujitsu.com (mailing list archive) |
---|---|
Headers | show |
Series | On-Demand Paging on SoftRoCE | expand |
On Wed, Oct 09, 2024 at 10:58:57AM +0900, Daisuke Matsuda wrote: > This patch series implements the On-Demand Paging feature on SoftRoCE(rxe) > driver, which has been available only in mlx5 driver[1] so far. > > This series has been blocked because of the hang issue of srp 002 test[2], > which was believed to be caused after applying the commit 9b4b7c1f9f54 > ("RDMA/rxe: Add workqueue support for rxe tasks"). My patches are dependent > on the commit because the ODP feature requires sleeping in kernel space, > and it is impossible with the former tasklet implementation. > > According to the original reporter[3], the hang issue is already gone in > v6.10. Additionally, tasklet is marked deprecated[4]. I think the rxe > driver is ready to accept this series since there is no longer any reason > to consider reverting back to the old tasklet. Okay, and it seems we are just ignoring the rxe bugs these days, so why not? Lets look at it Jason
在 2024/10/9 3:58, Daisuke Matsuda 写道: > This patch series implements the On-Demand Paging feature on SoftRoCE(rxe) > driver, which has been available only in mlx5 driver[1] so far. > > This series has been blocked because of the hang issue of srp 002 test[2], > which was believed to be caused after applying the commit 9b4b7c1f9f54 > ("RDMA/rxe: Add workqueue support for rxe tasks"). My patches are dependent > on the commit because the ODP feature requires sleeping in kernel space, > and it is impossible with the former tasklet implementation. > > According to the original reporter[3], the hang issue is already gone in > v6.10. Additionally, tasklet is marked deprecated[4]. I think the rxe > driver is ready to accept this series since there is no longer any reason > to consider reverting back to the old tasklet. > > I omitted some contents like the motive behind this series from the cover- > letter. Please see the cover letter of v3 for more details[5]. > > [Overview] > When applications register a memory region(MR), RDMA drivers normally pin > pages in the MR so that physical addresses are never changed during RDMA > communication. This requires the MR to fit in physical memory and > inevitably leads to memory pressure. On the other hand, On-Demand Paging > (ODP) allows applications to register MRs without pinning pages. They are > paged-in when the driver requires and paged-out when the OS reclaims. As a > result, it is possible to register a large MR that does not fit in physical > memory without taking up so much physical memory. > > [How does ODP work?] > "struct ib_umem_odp" is used to manage pages. It is created for each > ODP-enabled MR on its registration. This struct holds a pair of arrays > (dma_list/pfn_list) that serve as a driver page table. DMA addresses and > PFNs are stored in the driver page table. They are updated on page-in and > page-out, both of which use the common interfaces in the ib_uverbs layer. > > Page-in can occur when requester, responder or completer access an MR in > order to process RDMA operations. If they find that the pages being > accessed are not present on physical memory or requisite permissions are > not set on the pages, they provoke page fault to make the pages present > with proper permissions and at the same time update the driver page table. > After confirming the presence of the pages, they execute memory access such > as read, write or atomic operations. > > Page-out is triggered by page reclaim or filesystem events (e.g. metadata > update of a file that is being used as an MR). When creating an ODP-enabled > MR, the driver registers an MMU notifier callback. When the kernel issues a > page invalidation notification, the callback is provoked to unmap DMA > addresses and update the driver page table. After that, the kernel releases > the pages. > > [Supported operations] > All traditional operations are supported on RC connection. The new Atomic > write[6] and RDMA Flush[7] operations are not included in this patchset. I > will post them later after this patchset is merged. On UD connection, Send, > Recv, and SRQ-Recv are supported. > > [How to test ODP?] > There are only a few resources available for testing. pyverbs testcases in > rdma-core and perftest[8] are recommendable ones. Other than them, the > ibv_rc_pingpong command can also be used for testing. Note that you may > have to build perftest from upstream because old versions do not handle ODP > capabilities correctly. Thanks a lot. I have tested these patches with perftest. Because ODP (On Demand Paging) is a feature, can you also add some testcases into rdma core? So we can use rdma-core to make tests with this feature of rxe. That is, add some testcases in run_tests.py, so use run_tests.py to verify this (ODP) feature on rxe. Thanks, Zhu Yanjun > > The latest ODP tree is available from github: > https://github.com/ddmatsu/linux/tree/odp_v8 > > [Future work] > My next work is to enable the new Atomic write[6] and RDMA Flush[7] > operations with ODP. After that, I am going to implement the prefetch > feature. It allows applications to trigger page fault using > ibv_advise_mr(3) to optimize performance. Some existing software like > librpma[9] use this feature. Additionally, I think we can also add the > implicit ODP feature in the future. > > [1] Understanding On Demand Paging (ODP) > https://enterprise-support.nvidia.com/s/article/understanding-on-demand-paging--odp-x > > [2] [bug report] blktests srp/002 hang > https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ > > [3] blktests failures with v6.10-rc1 kernel > https://lore.kernel.org/linux-block/wnucs5oboi4flje5yvtea7puvn6zzztcnlrfz3lpzlwgblrxgw@7wvqdzioejgl/ > > [4] [00/15] ethernet: Convert from tasklet to BH workqueue > https://patchwork.kernel.org/project/linux-rdma/cover/20240621050525.3720069-1-allen.lkml@gmail.com/ > > [5] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE > https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@fujitsu.com/ > > [6] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation > https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/ > > [7] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation > https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/ > > [8] linux-rdma/perftest: Infiniband Verbs Performance Tests > https://github.com/linux-rdma/perftest > > [9] librpma: Remote Persistent Memory Access Library > https://github.com/pmem/rpma > > v7->v8: > 1) Dropped the first patch because the same change was made by Bob Pearson. > cf. https://github.com/torvalds/linux/commit/23bc06af547f2ca3b7d345e09fd8d04575406274 > 2) Rebased to 6.12.1-rc2 > > v6->v7: > 1) Rebased to 6.6.0 > 2) Disabled using hugepages with ODP > 3) Addressed comments on v6 from Jason and Zhu > cf. https://lore.kernel.org/lkml/cover.1694153251.git.matsuda-daisuke@fujitsu.com/ > > v5->v6: > Fixed the implementation according to Jason's suggestions > cf. https://lore.kernel.org/all/ZIdFXfDu4IMKE+BQ@nvidia.com/ > cf. https://lore.kernel.org/all/ZIdGU709e1h5h4JJ@nvidia.com/ > > v4->v5: > 1) Rebased to 6.4.0-rc2+ > 2) Changed to schedule all works on responder and completer to workqueue > > v3->v4: > 1) Re-designed functions that access MRs to use the MR xarray. > 2) Rebased onto the latest jgg-for-next tree. > > v2->v3: > 1) Removed a patch that changes the common ib_uverbs layer. > 2) Re-implemented patches for conversion to workqueue. > 3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n). > 4) Fixed some functions that returned incorrect errors. > 5) Temporarily disabled ODP for RDMA Flush and Atomic Write. > > v1->v2: > 1) Fixed a crash issue reported by Haris Iqbal. > 2) Tried to make lock patters clearer as pointed out by Romanovsky. > 3) Minor clean ups and fixes. > > Daisuke Matsuda (6): > RDMA/rxe: Make MR functions accessible from other rxe source code > RDMA/rxe: Move resp_states definition to rxe_verbs.h > RDMA/rxe: Add page invalidation support > RDMA/rxe: Allow registering MRs for On-Demand Paging > RDMA/rxe: Add support for Send/Recv/Write/Read with ODP > RDMA/rxe: Add support for the traditional Atomic operations with ODP > > drivers/infiniband/sw/rxe/Makefile | 2 + > drivers/infiniband/sw/rxe/rxe.c | 18 ++ > drivers/infiniband/sw/rxe/rxe.h | 37 ---- > drivers/infiniband/sw/rxe/rxe_loc.h | 39 ++++ > drivers/infiniband/sw/rxe/rxe_mr.c | 34 +++- > drivers/infiniband/sw/rxe/rxe_odp.c | 282 ++++++++++++++++++++++++++ > drivers/infiniband/sw/rxe/rxe_resp.c | 18 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > drivers/infiniband/sw/rxe/rxe_verbs.h | 37 ++++ > 9 files changed, 419 insertions(+), 53 deletions(-) > create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c >
On Fri, Oct 18, 2024 4:07 PM Zhu Yanjun wrote: > 在 2024/10/9 3:58, Daisuke Matsuda 写道: > > This patch series implements the On-Demand Paging feature on SoftRoCE(rxe) > > driver, which has been available only in mlx5 driver[1] so far. > > > > This series has been blocked because of the hang issue of srp 002 test[2], > > which was believed to be caused after applying the commit 9b4b7c1f9f54 > > ("RDMA/rxe: Add workqueue support for rxe tasks"). My patches are dependent > > on the commit because the ODP feature requires sleeping in kernel space, > > and it is impossible with the former tasklet implementation. > > > > According to the original reporter[3], the hang issue is already gone in > > v6.10. Additionally, tasklet is marked deprecated[4]. I think the rxe > > driver is ready to accept this series since there is no longer any reason > > to consider reverting back to the old tasklet. > > > > I omitted some contents like the motive behind this series from the cover- > > letter. Please see the cover letter of v3 for more details[5]. > > > > [Overview] > > When applications register a memory region(MR), RDMA drivers normally pin > > pages in the MR so that physical addresses are never changed during RDMA > > communication. This requires the MR to fit in physical memory and > > inevitably leads to memory pressure. On the other hand, On-Demand Paging > > (ODP) allows applications to register MRs without pinning pages. They are > > paged-in when the driver requires and paged-out when the OS reclaims. As a > > result, it is possible to register a large MR that does not fit in physical > > memory without taking up so much physical memory. > > > > [How does ODP work?] > > "struct ib_umem_odp" is used to manage pages. It is created for each > > ODP-enabled MR on its registration. This struct holds a pair of arrays > > (dma_list/pfn_list) that serve as a driver page table. DMA addresses and > > PFNs are stored in the driver page table. They are updated on page-in and > > page-out, both of which use the common interfaces in the ib_uverbs layer. > > > > Page-in can occur when requester, responder or completer access an MR in > > order to process RDMA operations. If they find that the pages being > > accessed are not present on physical memory or requisite permissions are > > not set on the pages, they provoke page fault to make the pages present > > with proper permissions and at the same time update the driver page table. > > After confirming the presence of the pages, they execute memory access such > > as read, write or atomic operations. > > > > Page-out is triggered by page reclaim or filesystem events (e.g. metadata > > update of a file that is being used as an MR). When creating an ODP-enabled > > MR, the driver registers an MMU notifier callback. When the kernel issues a > > page invalidation notification, the callback is provoked to unmap DMA > > addresses and update the driver page table. After that, the kernel releases > > the pages. > > > > [Supported operations] > > All traditional operations are supported on RC connection. The new Atomic > > write[6] and RDMA Flush[7] operations are not included in this patchset. I > > will post them later after this patchset is merged. On UD connection, Send, > > Recv, and SRQ-Recv are supported. > > > > [How to test ODP?] > > There are only a few resources available for testing. pyverbs testcases in > > rdma-core and perftest[8] are recommendable ones. Other than them, the > > ibv_rc_pingpong command can also be used for testing. Note that you may > > have to build perftest from upstream because old versions do not handle ODP > > capabilities correctly. > > Thanks a lot. I have tested these patches with perftest. Because ODP (On > Demand Paging) is a feature, can you also add some testcases into rdma > core? So we can use rdma-core to make tests with this feature of rxe. I added Read/Write/Atomics tests two years ago. Cf. https://github.com/linux-rdma/rdma-core/pull/1229 Each of ODP testcases causes page invalidation so that RDMA traffic access triggers ODP page-in flow. Currently, 7 testcases below can pass on rxe ODP v8 implementation. test_odp_rc_atomic_cmp_and_swp test_odp_rc_atomic_fetch_and_add test_odp_rc_mixed_mr test_odp_rc_rdma_read test_odp_rc_rdma_write test_odp_rc_traffic test_odp_ud_traffic The rest 11 tests are just skipped because of lack of capabilities. Please let me know if you have any suggestions for improvement. Thanks, Daisuke Matsuda > > That is, add some testcases in run_tests.py, so use run_tests.py to > verify this (ODP) feature on rxe. > > Thanks, > Zhu Yanjun > > > > > The latest ODP tree is available from github: > > https://github.com/ddmatsu/linux/tree/odp_v8 > > > > [Future work] > > My next work is to enable the new Atomic write[6] and RDMA Flush[7] > > operations with ODP. After that, I am going to implement the prefetch > > feature. It allows applications to trigger page fault using > > ibv_advise_mr(3) to optimize performance. Some existing software like > > librpma[9] use this feature. Additionally, I think we can also add the > > implicit ODP feature in the future. > > > > [1] Understanding On Demand Paging (ODP) > > https://enterprise-support.nvidia.com/s/article/understanding-on-demand-paging--odp-x > > > > [2] [bug report] blktests srp/002 hang > > https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ > > > > [3] blktests failures with v6.10-rc1 kernel > > https://lore.kernel.org/linux-block/wnucs5oboi4flje5yvtea7puvn6zzztcnlrfz3lpzlwgblrxgw@7wvqdzioejgl/ > > > > [4] [00/15] ethernet: Convert from tasklet to BH workqueue > > https://patchwork.kernel.org/project/linux-rdma/cover/20240621050525.3720069-1-allen.lkml@gmail.com/ > > > > [5] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE > > https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@fujitsu.com/ > > > > [6] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation > > https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/ > > > > [7] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation > > https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/ > > > > [8] linux-rdma/perftest: Infiniband Verbs Performance Tests > > https://github.com/linux-rdma/perftest > > > > [9] librpma: Remote Persistent Memory Access Library > > https://github.com/pmem/rpma > > > > v7->v8: > > 1) Dropped the first patch because the same change was made by Bob Pearson. > > cf. https://github.com/torvalds/linux/commit/23bc06af547f2ca3b7d345e09fd8d04575406274 > > 2) Rebased to 6.12.1-rc2 > > > > v6->v7: > > 1) Rebased to 6.6.0 > > 2) Disabled using hugepages with ODP > > 3) Addressed comments on v6 from Jason and Zhu > > cf. https://lore.kernel.org/lkml/cover.1694153251.git.matsuda-daisuke@fujitsu.com/ > > > > v5->v6: > > Fixed the implementation according to Jason's suggestions > > cf. https://lore.kernel.org/all/ZIdFXfDu4IMKE+BQ@nvidia.com/ > > cf. https://lore.kernel.org/all/ZIdGU709e1h5h4JJ@nvidia.com/ > > > > v4->v5: > > 1) Rebased to 6.4.0-rc2+ > > 2) Changed to schedule all works on responder and completer to workqueue > > > > v3->v4: > > 1) Re-designed functions that access MRs to use the MR xarray. > > 2) Rebased onto the latest jgg-for-next tree. > > > > v2->v3: > > 1) Removed a patch that changes the common ib_uverbs layer. > > 2) Re-implemented patches for conversion to workqueue. > > 3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n). > > 4) Fixed some functions that returned incorrect errors. > > 5) Temporarily disabled ODP for RDMA Flush and Atomic Write. > > > > v1->v2: > > 1) Fixed a crash issue reported by Haris Iqbal. > > 2) Tried to make lock patters clearer as pointed out by Romanovsky. > > 3) Minor clean ups and fixes. > > > > Daisuke Matsuda (6): > > RDMA/rxe: Make MR functions accessible from other rxe source code > > RDMA/rxe: Move resp_states definition to rxe_verbs.h > > RDMA/rxe: Add page invalidation support > > RDMA/rxe: Allow registering MRs for On-Demand Paging > > RDMA/rxe: Add support for Send/Recv/Write/Read with ODP > > RDMA/rxe: Add support for the traditional Atomic operations with ODP > > > > drivers/infiniband/sw/rxe/Makefile | 2 + > > drivers/infiniband/sw/rxe/rxe.c | 18 ++ > > drivers/infiniband/sw/rxe/rxe.h | 37 ---- > > drivers/infiniband/sw/rxe/rxe_loc.h | 39 ++++ > > drivers/infiniband/sw/rxe/rxe_mr.c | 34 +++- > > drivers/infiniband/sw/rxe/rxe_odp.c | 282 ++++++++++++++++++++++++++ > > drivers/infiniband/sw/rxe/rxe_resp.c | 18 +- > > drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- > > drivers/infiniband/sw/rxe/rxe_verbs.h | 37 ++++ > > 9 files changed, 419 insertions(+), 53 deletions(-) > > create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c > >
在 2024/10/28 8:59, Daisuke Matsuda (Fujitsu) 写道: > On Fri, Oct 18, 2024 4:07 PM Zhu Yanjun wrote: >> 在 2024/10/9 3:58, Daisuke Matsuda 写道: >>> This patch series implements the On-Demand Paging feature on SoftRoCE(rxe) >>> driver, which has been available only in mlx5 driver[1] so far. >>> >>> This series has been blocked because of the hang issue of srp 002 test[2], >>> which was believed to be caused after applying the commit 9b4b7c1f9f54 >>> ("RDMA/rxe: Add workqueue support for rxe tasks"). My patches are dependent >>> on the commit because the ODP feature requires sleeping in kernel space, >>> and it is impossible with the former tasklet implementation. >>> >>> According to the original reporter[3], the hang issue is already gone in >>> v6.10. Additionally, tasklet is marked deprecated[4]. I think the rxe >>> driver is ready to accept this series since there is no longer any reason >>> to consider reverting back to the old tasklet. >>> >>> I omitted some contents like the motive behind this series from the cover- >>> letter. Please see the cover letter of v3 for more details[5]. >>> >>> [Overview] >>> When applications register a memory region(MR), RDMA drivers normally pin >>> pages in the MR so that physical addresses are never changed during RDMA >>> communication. This requires the MR to fit in physical memory and >>> inevitably leads to memory pressure. On the other hand, On-Demand Paging >>> (ODP) allows applications to register MRs without pinning pages. They are >>> paged-in when the driver requires and paged-out when the OS reclaims. As a >>> result, it is possible to register a large MR that does not fit in physical >>> memory without taking up so much physical memory. >>> >>> [How does ODP work?] >>> "struct ib_umem_odp" is used to manage pages. It is created for each >>> ODP-enabled MR on its registration. This struct holds a pair of arrays >>> (dma_list/pfn_list) that serve as a driver page table. DMA addresses and >>> PFNs are stored in the driver page table. They are updated on page-in and >>> page-out, both of which use the common interfaces in the ib_uverbs layer. >>> >>> Page-in can occur when requester, responder or completer access an MR in >>> order to process RDMA operations. If they find that the pages being >>> accessed are not present on physical memory or requisite permissions are >>> not set on the pages, they provoke page fault to make the pages present >>> with proper permissions and at the same time update the driver page table. >>> After confirming the presence of the pages, they execute memory access such >>> as read, write or atomic operations. >>> >>> Page-out is triggered by page reclaim or filesystem events (e.g. metadata >>> update of a file that is being used as an MR). When creating an ODP-enabled >>> MR, the driver registers an MMU notifier callback. When the kernel issues a >>> page invalidation notification, the callback is provoked to unmap DMA >>> addresses and update the driver page table. After that, the kernel releases >>> the pages. >>> >>> [Supported operations] >>> All traditional operations are supported on RC connection. The new Atomic >>> write[6] and RDMA Flush[7] operations are not included in this patchset. I >>> will post them later after this patchset is merged. On UD connection, Send, >>> Recv, and SRQ-Recv are supported. >>> >>> [How to test ODP?] >>> There are only a few resources available for testing. pyverbs testcases in >>> rdma-core and perftest[8] are recommendable ones. Other than them, the >>> ibv_rc_pingpong command can also be used for testing. Note that you may >>> have to build perftest from upstream because old versions do not handle ODP >>> capabilities correctly. >> >> Thanks a lot. I have tested these patches with perftest. Because ODP (On >> Demand Paging) is a feature, can you also add some testcases into rdma >> core? So we can use rdma-core to make tests with this feature of rxe. > > I added Read/Write/Atomics tests two years ago. > Cf. https://github.com/linux-rdma/rdma-core/pull/1229 > > Each of ODP testcases causes page invalidation so that RDMA traffic > access triggers ODP page-in flow. > > Currently, 7 testcases below can pass on rxe ODP v8 implementation. > test_odp_rc_atomic_cmp_and_swp > test_odp_rc_atomic_fetch_and_add > test_odp_rc_mixed_mr > test_odp_rc_rdma_read > test_odp_rc_rdma_write > test_odp_rc_traffic > test_odp_ud_traffic > The rest 11 tests are just skipped because of lack of capabilities. Thanks. Run rdma-core, the above tests can also work successfully in my test environment. I am fine with this. Zhu Yanjun > > Please let me know if you have any suggestions for improvement. > > Thanks, > Daisuke Matsuda > >> >> That is, add some testcases in run_tests.py, so use run_tests.py to >> verify this (ODP) feature on rxe. >> >> Thanks, >> Zhu Yanjun >> >>> >>> The latest ODP tree is available from github: >>> https://github.com/ddmatsu/linux/tree/odp_v8 >>> >>> [Future work] >>> My next work is to enable the new Atomic write[6] and RDMA Flush[7] >>> operations with ODP. After that, I am going to implement the prefetch >>> feature. It allows applications to trigger page fault using >>> ibv_advise_mr(3) to optimize performance. Some existing software like >>> librpma[9] use this feature. Additionally, I think we can also add the >>> implicit ODP feature in the future. >>> >>> [1] Understanding On Demand Paging (ODP) >>> https://enterprise-support.nvidia.com/s/article/understanding-on-demand-paging--odp-x >>> >>> [2] [bug report] blktests srp/002 hang >>> https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/ >>> >>> [3] blktests failures with v6.10-rc1 kernel >>> https://lore.kernel.org/linux-block/wnucs5oboi4flje5yvtea7puvn6zzztcnlrfz3lpzlwgblrxgw@7wvqdzioejgl/ >>> >>> [4] [00/15] ethernet: Convert from tasklet to BH workqueue >>> https://patchwork.kernel.org/project/linux-rdma/cover/20240621050525.3720069-1-allen.lkml@gmail.com/ >>> >>> [5] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE >>> https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@fujitsu.com/ >>> >>> [6] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation >>> https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/ >>> >>> [7] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation >>> https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/ >>> >>> [8] linux-rdma/perftest: Infiniband Verbs Performance Tests >>> https://github.com/linux-rdma/perftest >>> >>> [9] librpma: Remote Persistent Memory Access Library >>> https://github.com/pmem/rpma >>> >>> v7->v8: >>> 1) Dropped the first patch because the same change was made by Bob Pearson. >>> cf. https://github.com/torvalds/linux/commit/23bc06af547f2ca3b7d345e09fd8d04575406274 >>> 2) Rebased to 6.12.1-rc2 >>> >>> v6->v7: >>> 1) Rebased to 6.6.0 >>> 2) Disabled using hugepages with ODP >>> 3) Addressed comments on v6 from Jason and Zhu >>> cf. https://lore.kernel.org/lkml/cover.1694153251.git.matsuda-daisuke@fujitsu.com/ >>> >>> v5->v6: >>> Fixed the implementation according to Jason's suggestions >>> cf. https://lore.kernel.org/all/ZIdFXfDu4IMKE+BQ@nvidia.com/ >>> cf. https://lore.kernel.org/all/ZIdGU709e1h5h4JJ@nvidia.com/ >>> >>> v4->v5: >>> 1) Rebased to 6.4.0-rc2+ >>> 2) Changed to schedule all works on responder and completer to workqueue >>> >>> v3->v4: >>> 1) Re-designed functions that access MRs to use the MR xarray. >>> 2) Rebased onto the latest jgg-for-next tree. >>> >>> v2->v3: >>> 1) Removed a patch that changes the common ib_uverbs layer. >>> 2) Re-implemented patches for conversion to workqueue. >>> 3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n). >>> 4) Fixed some functions that returned incorrect errors. >>> 5) Temporarily disabled ODP for RDMA Flush and Atomic Write. >>> >>> v1->v2: >>> 1) Fixed a crash issue reported by Haris Iqbal. >>> 2) Tried to make lock patters clearer as pointed out by Romanovsky. >>> 3) Minor clean ups and fixes. >>> >>> Daisuke Matsuda (6): >>> RDMA/rxe: Make MR functions accessible from other rxe source code >>> RDMA/rxe: Move resp_states definition to rxe_verbs.h >>> RDMA/rxe: Add page invalidation support >>> RDMA/rxe: Allow registering MRs for On-Demand Paging >>> RDMA/rxe: Add support for Send/Recv/Write/Read with ODP >>> RDMA/rxe: Add support for the traditional Atomic operations with ODP >>> >>> drivers/infiniband/sw/rxe/Makefile | 2 + >>> drivers/infiniband/sw/rxe/rxe.c | 18 ++ >>> drivers/infiniband/sw/rxe/rxe.h | 37 ---- >>> drivers/infiniband/sw/rxe/rxe_loc.h | 39 ++++ >>> drivers/infiniband/sw/rxe/rxe_mr.c | 34 +++- >>> drivers/infiniband/sw/rxe/rxe_odp.c | 282 ++++++++++++++++++++++++++ >>> drivers/infiniband/sw/rxe/rxe_resp.c | 18 +- >>> drivers/infiniband/sw/rxe/rxe_verbs.c | 5 +- >>> drivers/infiniband/sw/rxe/rxe_verbs.h | 37 ++++ >>> 9 files changed, 419 insertions(+), 53 deletions(-) >>> create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c >>> >
On Fri, Oct 18, 2024 4:28 AM Jason Gunthorpe wrote: > On Wed, Oct 09, 2024 at 10:58:57AM +0900, Daisuke Matsuda wrote: > > This patch series implements the On-Demand Paging feature on SoftRoCE(rxe) > > driver, which has been available only in mlx5 driver[1] so far. > > > > This series has been blocked because of the hang issue of srp 002 test[2], > > which was believed to be caused after applying the commit 9b4b7c1f9f54 > > ("RDMA/rxe: Add workqueue support for rxe tasks"). My patches are dependent > > on the commit because the ODP feature requires sleeping in kernel space, > > and it is impossible with the former tasklet implementation. > > > > According to the original reporter[3], the hang issue is already gone in > > v6.10. Additionally, tasklet is marked deprecated[4]. I think the rxe > > driver is ready to accept this series since there is no longer any reason > > to consider reverting back to the old tasklet. > > Okay, and it seems we are just ignoring the rxe bugs these days, so > why not? Lets look at it Hi Jason, What we have seen so far suggests that the hang is derived from a potential timing issue in srp drivers. I believe it cannot be a reason to delay this feature indefinitely. However, I understand your stance as a maintainer is not wrong. It is natural you want to improve overall quality of the infiniband subsystem, including the ULP drivers. I am committed to maintaining and improving the rxe and underlying drivers, but I am sorry that I cannot take the enough time to delve into the other components right now. I must leave it to you whether to continue to block my patchset or not. You are the maintainer and have the final word on it. Thanks, Daisuke Matsuda > > Jason