mbox series

[for-next,v7,0/7] On-Demand Paging on SoftRoCE

Message ID cover.1699503619.git.matsuda-daisuke@fujitsu.com (mailing list archive)
Headers show
Series On-Demand Paging on SoftRoCE | expand

Message

Daisuke Matsuda (Fujitsu) Nov. 9, 2023, 5:44 a.m. UTC
This patch series implements the On-Demand Paging feature on SoftRoCE(rxe)
driver, which has been available only in mlx5 driver[1] so far.

This series is dependent on the commit 9b4b7c1f9f54 ("RDMA/rxe: Add
workqueue support for rxe tasks"), which replaced the triple tasklets with
a workqueue. The patch has been suspected of introducing the hang issue of
srp 002 test[2]. According to investigation by Bob and Bart, it is likely
to be a timing issue that can potentially occur with both rxe and siw[3].
So I resume sending my ODP patches anyway.

I omitted some contents like the motive behind this series from the cover-
letter. Please see the cover letter of v3 for more details[4].

[Overview]
When applications register a memory region(MR), RDMA drivers normally pin
pages in the MR so that physical addresses are never changed during RDMA
communication. This requires the MR to fit in physical memory and
inevitably leads to memory pressure. On the other hand, On-Demand Paging
(ODP) allows applications to register MRs without pinning pages. They are
paged-in when the driver requires and paged-out when the OS reclaims. As a
result, it is possible to register a large MR that does not fit in physical
memory without taking up so much physical memory.

[How does ODP work?]
"struct ib_umem_odp" is used to manage pages. It is created for each
ODP-enabled MR on its registration. This struct holds a pair of arrays
(dma_list/pfn_list) that serve as a driver page table. DMA addresses and
PFNs are stored in the driver page table. They are updated on page-in and
page-out, both of which use the common interfaces in the ib_uverbs layer.

Page-in can occur when requester, responder or completer access an MR in
order to process RDMA operations. If they find that the pages being
accessed are not present on physical memory or requisite permissions are
not set on the pages, they provoke page fault to make the pages present
with proper permissions and at the same time update the driver page table.
After confirming the presence of the pages, they execute memory access such
as read, write or atomic operations.

Page-out is triggered by page reclaim or filesystem events (e.g. metadata
update of a file that is being used as an MR). When creating an ODP-enabled
MR, the driver registers an MMU notifier callback. When the kernel issues a
page invalidation notification, the callback is provoked to unmap DMA
addresses and update the driver page table. After that, the kernel releases
the pages.

[Supported operations]
All traditional operations are supported on RC connection. The new Atomic
write[5] and RDMA Flush[6] operations are not included in this patchset. I
will post them later after this patchset is merged. On UD connection, Send,
Recv, and SRQ-Recv are supported.

[How to test ODP?]
There are only a few resources available for testing. pyverbs testcases in
rdma-core and perftest[7] are recommendable ones. Other than them, the
ibv_rc_pingpong command can also be used for testing. Note that you may
have to build perftest from upstream because older versions do not handle
ODP capabilities correctly.

The ODP tree is available from github:
https://github.com/ddmatsu/linux/tree/odp_v7

[Future work]
My next work is to enable the new Atomic write[5] and RDMA Flush[6]
operations with ODP. After that, I am going to implement the prefetch
feature. It allows applications to trigger page fault using
ibv_advise_mr(3) to optimize performance. Some existing software like
librpma[8] use this feature. Additionally, I think we can also add the
implicit ODP feature in the future.

[1] [RFC 00/20] On demand paging
https://www.spinics.net/lists/linux-rdma/msg18906.html

[2] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE
https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@fujitsu.com/

[3] [bug report] blktests srp/002 hang
https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/

[4] srp/002 hang in blktests
https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/

[5] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation
https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/

[6] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation
https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/

[7] linux-rdma/perftest: Infiniband Verbs Performance Tests
https://github.com/linux-rdma/perftest

[8] librpma: Remote Persistent Memory Access Library
https://github.com/pmem/rpma

v6->v7:
 1) Rebased to 6.6.0
 2) Disabled using hugepages with ODP
 3) Addressed comments on v6 from Jason and Zhu
   cf. https://lore.kernel.org/lkml/cover.1694153251.git.matsuda-daisuke@fujitsu.com/

v5->v6:
 Fixed the implementation according to Jason's suggestions
   cf. https://lore.kernel.org/all/ZIdFXfDu4IMKE+BQ@nvidia.com/
   cf. https://lore.kernel.org/all/ZIdGU709e1h5h4JJ@nvidia.com/

v4->v5:
 1) Rebased to 6.4.0-rc2+
 2) Changed to schedule all works on responder and completer to workqueue

v3->v4:
 1) Re-designed functions that access MRs to use the MR xarray.
 2) Rebased onto the latest jgg-for-next tree.

v2->v3:
 1) Removed a patch that changes the common ib_uverbs layer.
 2) Re-implemented patches for conversion to workqueue.
 3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n).
 4) Fixed some functions that returned incorrect errors.
 5) Temporarily disabled ODP for RDMA Flush and Atomic Write.

v1->v2:
 1) Fixed a crash issue reported by Haris Iqbal.
 2) Tried to make lock patters clearer as pointed out by Romanovsky.
 3) Minor clean ups and fixes.

Daisuke Matsuda (7):
  RDMA/rxe: Always defer tasks on responder and completer to workqueue
  RDMA/rxe: Make MR functions accessible from other rxe source code
  RDMA/rxe: Move resp_states definition to rxe_verbs.h
  RDMA/rxe: Add page invalidation support
  RDMA/rxe: Allow registering MRs for On-Demand Paging
  RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
  RDMA/rxe: Add support for the traditional Atomic operations with ODP

 drivers/infiniband/sw/rxe/Makefile          |   2 +
 drivers/infiniband/sw/rxe/rxe.c             |  18 ++
 drivers/infiniband/sw/rxe/rxe.h             |  37 ---
 drivers/infiniband/sw/rxe/rxe_comp.c        |  12 +-
 drivers/infiniband/sw/rxe/rxe_hw_counters.c |   1 -
 drivers/infiniband/sw/rxe/rxe_hw_counters.h |   1 -
 drivers/infiniband/sw/rxe/rxe_loc.h         |  39 +++
 drivers/infiniband/sw/rxe/rxe_mr.c          |  34 ++-
 drivers/infiniband/sw/rxe/rxe_odp.c         | 289 ++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_resp.c        |  31 +--
 drivers/infiniband/sw/rxe/rxe_verbs.c       |   5 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h       |  37 +++
 12 files changed, 428 insertions(+), 78 deletions(-)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c

Comments

Jason Gunthorpe Dec. 5, 2023, 12:11 a.m. UTC | #1
On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> 
> Daisuke Matsuda (7):
>   RDMA/rxe: Always defer tasks on responder and completer to workqueue
>   RDMA/rxe: Make MR functions accessible from other rxe source code
>   RDMA/rxe: Move resp_states definition to rxe_verbs.h
>   RDMA/rxe: Add page invalidation support
>   RDMA/rxe: Allow registering MRs for On-Demand Paging
>   RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>   RDMA/rxe: Add support for the traditional Atomic operations with ODP

What is the current situation with rxe? I don't recall seeing the bugs
that were reported get fixed?

I'm reluctant to dig a deeper hold until it is done?

Thanks,
Jason
Zhu Yanjun Dec. 5, 2023, 1:50 a.m. UTC | #2
在 2023/12/5 8:11, Jason Gunthorpe 写道:
> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>
>> Daisuke Matsuda (7):
>>    RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>    RDMA/rxe: Make MR functions accessible from other rxe source code
>>    RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>    RDMA/rxe: Add page invalidation support
>>    RDMA/rxe: Allow registering MRs for On-Demand Paging
>>    RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>    RDMA/rxe: Add support for the traditional Atomic operations with ODP
> 
> What is the current situation with rxe? I don't recall seeing the bugs
> that were reported get fixed?

Exactly. A problem is reported in the link 
https://www.spinics.net/lists/linux-rdma/msg120947.html

It seems that a variable 'entry' set but not used 
[-Wunused-but-set-variable]

And ODP is an important feature. Should we suggest to add a test case 
about this ODP in rdma-core to verify this ODP feature?

Zhu Yanjun

> 
> I'm reluctant to dig a deeper hold until it is done?
> 
> Thanks,
> Jason
Daisuke Matsuda (Fujitsu) Dec. 7, 2023, 6:37 a.m. UTC | #3
On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> 
> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> >>
> >> Daisuke Matsuda (7):
> >>    RDMA/rxe: Always defer tasks on responder and completer to workqueue
> >>    RDMA/rxe: Make MR functions accessible from other rxe source code
> >>    RDMA/rxe: Move resp_states definition to rxe_verbs.h
> >>    RDMA/rxe: Add page invalidation support
> >>    RDMA/rxe: Allow registering MRs for On-Demand Paging
> >>    RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> >>    RDMA/rxe: Add support for the traditional Atomic operations with ODP
> >
> > What is the current situation with rxe? I don't recall seeing the bugs
> > that were reported get fixed?

Well, I suppose Jason is mentioning "blktests srp/002 hang".
cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/

It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
so the hang looks not specific to rxe.
cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.


There is another issue that causes kernel panic.
[bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
Zhijian has submitted patches to fix this, and he got some comments.
It looks he is involved in CXL driver intensively these days.
I guess he is still working on it.

> 
> Exactly. A problem is reported in the link
> https://www.spinics.net/lists/linux-rdma/msg120947.html
> 
> It seems that a variable 'entry' set but not used
> [-Wunused-but-set-variable]

Yeah, I can revise the patch anytime.

> 
> And ODP is an important feature. Should we suggest to add a test case
> about this ODP in rdma-core to verify this ODP feature?

Rxe can share the same tests with mlx5.
I added test cases for Write, Read and Atomic operations with ODP,
and we can add more tests if there are any suggestions.
Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py

Thanks,
Daisuke Matsuda

> 
> Zhu Yanjun
> 
> >
> > I'm reluctant to dig a deeper hold until it is done?
> >
> > Thanks,
> > Jason
Zhu Yanjun Dec. 12, 2023, 6:07 p.m. UTC | #4
在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>>
>> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
>>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>>>
>>>> Daisuke Matsuda (7):
>>>>     RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>>>     RDMA/rxe: Make MR functions accessible from other rxe source code
>>>>     RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>>>     RDMA/rxe: Add page invalidation support
>>>>     RDMA/rxe: Allow registering MRs for On-Demand Paging
>>>>     RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>>>     RDMA/rxe: Add support for the traditional Atomic operations with ODP
>>>
>>> What is the current situation with rxe? I don't recall seeing the bugs
>>> that were reported get fixed?
> 
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> 
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
> 
> 
> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> 
> https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
> Zhijian has submitted patches to fix this, and he got some comments.
> It looks he is involved in CXL driver intensively these days.
> I guess he is still working on it.
> 
>>
>> Exactly. A problem is reported in the link
>> https://www.spinics.net/lists/linux-rdma/msg120947.html
>>
>> It seems that a variable 'entry' set but not used
>> [-Wunused-but-set-variable]
> 
> Yeah, I can revise the patch anytime.
> 
>>
>> And ODP is an important feature. Should we suggest to add a test case
>> about this ODP in rdma-core to verify this ODP feature?
> 
> Rxe can share the same tests with mlx5.
> I added test cases for Write, Read and Atomic operations with ODP,
> and we can add more tests if there are any suggestions.
> Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py

Thanks a lot.
Do you make tests with blktests after your patches are applied with the 
latest kernel?

Zhu Yanjun

> 
> Thanks,
> Daisuke Matsuda
> 
>>
>> Zhu Yanjun
>>
>>>
>>> I'm reluctant to dig a deeper hold until it is done?
>>>
>>> Thanks,
>>> Jason
>
Daisuke Matsuda (Fujitsu) Dec. 14, 2023, 5:55 a.m. UTC | #5
On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote:
> 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
> > On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> >>
> >> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> >>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> >>>>
> >>>> Daisuke Matsuda (7):
> >>>>     RDMA/rxe: Always defer tasks on responder and completer to workqueue
> >>>>     RDMA/rxe: Make MR functions accessible from other rxe source code
> >>>>     RDMA/rxe: Move resp_states definition to rxe_verbs.h
> >>>>     RDMA/rxe: Add page invalidation support
> >>>>     RDMA/rxe: Allow registering MRs for On-Demand Paging
> >>>>     RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> >>>>     RDMA/rxe: Add support for the traditional Atomic operations with ODP
> >>>
> >>> What is the current situation with rxe? I don't recall seeing the bugs
> >>> that were reported get fixed?
> >
> > Well, I suppose Jason is mentioning "blktests srp/002 hang".
> > cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> >
> > It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> > so the hang looks not specific to rxe.
> > cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
> > I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
> >
> >
> > There is another issue that causes kernel panic.
> > [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> > cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> >
> > https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
> > Zhijian has submitted patches to fix this, and he got some comments.
> > It looks he is involved in CXL driver intensively these days.
> > I guess he is still working on it.
> >
> >>
> >> Exactly. A problem is reported in the link
> >> https://www.spinics.net/lists/linux-rdma/msg120947.html
> >>
> >> It seems that a variable 'entry' set but not used
> >> [-Wunused-but-set-variable]
> >
> > Yeah, I can revise the patch anytime.
> >
> >>
> >> And ODP is an important feature. Should we suggest to add a test case
> >> about this ODP in rdma-core to verify this ODP feature?
> >
> > Rxe can share the same tests with mlx5.
> > I added test cases for Write, Read and Atomic operations with ODP,
> > and we can add more tests if there are any suggestions.
> > Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py
> 
> Thanks a lot.
> Do you make tests with blktests after your patches are applied with the
> latest kernel?

I have not done that yet, but I agree I should do it.
I will try to take time for the test before submitting v8

Thanks,
Daisuke Matsuda


> 
> Zhu Yanjun
> 
> >
> > Thanks,
> > Daisuke Matsuda
> >
> >>
> >> Zhu Yanjun
> >>
> >>>
> >>> I'm reluctant to dig a deeper hold until it is done?
> >>>
> >>> Thanks,
> >>> Jason
> >
>
Zhu Yanjun Dec. 15, 2023, 2:46 a.m. UTC | #6
在 2023/12/14 13:55, Daisuke Matsuda (Fujitsu) 写道:
> On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote:
>> 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
>>> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>>>> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
>>>>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>>>>> Daisuke Matsuda (7):
>>>>>>      RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>>>>>      RDMA/rxe: Make MR functions accessible from other rxe source code
>>>>>>      RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>>>>>      RDMA/rxe: Add page invalidation support
>>>>>>      RDMA/rxe: Allow registering MRs for On-Demand Paging
>>>>>>      RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>>>>>      RDMA/rxe: Add support for the traditional Atomic operations with ODP
>>>>> What is the current situation with rxe? I don't recall seeing the bugs
>>>>> that were reported get fixed?
>>> Well, I suppose Jason is mentioning "blktests srp/002 hang".
>>> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
>>>
>>> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
>>> so the hang looks not specific to rxe.
>>> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
>>> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
>>>
>>>
>>> There is another issue that causes kernel panic.
>>> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
>>> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>>
>>> https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
>>> Zhijian has submitted patches to fix this, and he got some comments.
>>> It looks he is involved in CXL driver intensively these days.
>>> I guess he is still working on it.
>>>
>>>> Exactly. A problem is reported in the link
>>>> https://www.spinics.net/lists/linux-rdma/msg120947.html
>>>>
>>>> It seems that a variable 'entry' set but not used
>>>> [-Wunused-but-set-variable]
>>> Yeah, I can revise the patch anytime.
>>>
>>>> And ODP is an important feature. Should we suggest to add a test case
>>>> about this ODP in rdma-core to verify this ODP feature?
>>> Rxe can share the same tests with mlx5.
>>> I added test cases for Write, Read and Atomic operations with ODP,
>>> and we can add more tests if there are any suggestions.
>>> Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py
>> Thanks a lot.
>> Do you make tests with blktests after your patches are applied with the
>> latest kernel?
> I have not done that yet, but I agree I should do it.
> I will try to take time for the test before submitting v8

Thanks. Hope blktest can work well with your commits.

Zhu Yanjun

>
> Thanks,
> Daisuke Matsuda
>
>
>> Zhu Yanjun
>>
>>> Thanks,
>>> Daisuke Matsuda
>>>
>>>> Zhu Yanjun
>>>>
>>>>> I'm reluctant to dig a deeper hold until it is done?
>>>>>
>>>>> Thanks,
>>>>> Jason
Jason Gunthorpe Jan. 4, 2024, 2:56 p.m. UTC | #7
On Thu, Dec 07, 2023 at 06:37:13AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> > 
> > 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> > >>
> > >> Daisuke Matsuda (7):
> > >>    RDMA/rxe: Always defer tasks on responder and completer to workqueue
> > >>    RDMA/rxe: Make MR functions accessible from other rxe source code
> > >>    RDMA/rxe: Move resp_states definition to rxe_verbs.h
> > >>    RDMA/rxe: Add page invalidation support
> > >>    RDMA/rxe: Allow registering MRs for On-Demand Paging
> > >>    RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> > >>    RDMA/rxe: Add support for the traditional Atomic operations with ODP
> > >
> > > What is the current situation with rxe? I don't recall seeing the bugs
> > > that were reported get fixed?
> 
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> 
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.

Bob? Is that what we think?

> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

This is more understandable, and the fix of matching the MTT size to
the PAGE_SIZE seems reasonable to me.

Jason