mbox series

[RFC,0/4] NFS: Fix another 'check_flush_dependency' splat

Message ID 20240429152537.212958-6-cel@kernel.org (mailing list archive)
Headers show
Series NFS: Fix another 'check_flush_dependency' splat | expand

Message

Chuck Lever April 29, 2024, 3:25 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

Avoid getting work queue splats in the system journal by moving
client-side RPC/RDMA transport tear-down into a background process.

I've done some testing of this series, now looking for review
comments.

Chuck Lever (4):
  xprtrdma: Remove temp allocation of rpcrdma_rep objects
  xprtrdma: Clean up synopsis of frwr_mr_unmap()
  xprtrdma: Delay releasing connection hardware resources
  xprtrdma: Move MRs to struct rpcrdma_ep

 net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
 net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
 net/sunrpc/xprtrdma/transport.c |  20 +++-
 net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
 net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
 5 files changed, 125 insertions(+), 105 deletions(-)


base-commit: e67572cd2204894179d89bd7b984072f19313b03

Comments

Zhu Yanjun April 30, 2024, 7:26 a.m. UTC | #1
On 29.04.24 17:25, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Avoid getting work queue splats in the system journal by moving
> client-side RPC/RDMA transport tear-down into a background process.
> 
> I've done some testing of this series, now looking for review
> comments.

How to make tests with nfs && rdma? Can you provide some steps or tools?
I am interested in nfs && rdma.

Thanks,
Zhu Yanjun

> 
> Chuck Lever (4):
>    xprtrdma: Remove temp allocation of rpcrdma_rep objects
>    xprtrdma: Clean up synopsis of frwr_mr_unmap()
>    xprtrdma: Delay releasing connection hardware resources
>    xprtrdma: Move MRs to struct rpcrdma_ep
> 
>   net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
>   net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
>   net/sunrpc/xprtrdma/transport.c |  20 +++-
>   net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
>   net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
>   5 files changed, 125 insertions(+), 105 deletions(-)
> 
> 
> base-commit: e67572cd2204894179d89bd7b984072f19313b03
Chuck Lever April 30, 2024, 1:42 p.m. UTC | #2
> On Apr 30, 2024, at 3:26 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> 
> On 29.04.24 17:25, cel@kernel.org wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> Avoid getting work queue splats in the system journal by moving
>> client-side RPC/RDMA transport tear-down into a background process.
>> I've done some testing of this series, now looking for review
>> comments.
> 
> How to make tests with nfs && rdma? Can you provide some steps or tools?

We are building NFS tests into kdevops:

   https://github.com/linux-kdevops/kdevops.git

and there is a config option to use soft iWARP instead of TCP.

kdevops includes workflows for fstests, Mora's nfstest, the
git regression suite, and ltp, all of which we use regularly
to test the Linux NFS client and server implementations.


> I am interested in nfs && rdma.
> 
> Thanks,
> Zhu Yanjun
> 
>> Chuck Lever (4):
>>   xprtrdma: Remove temp allocation of rpcrdma_rep objects
>>   xprtrdma: Clean up synopsis of frwr_mr_unmap()
>>   xprtrdma: Delay releasing connection hardware resources
>>   xprtrdma: Move MRs to struct rpcrdma_ep
>>  net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
>>  net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
>>  net/sunrpc/xprtrdma/transport.c |  20 +++-
>>  net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
>>  net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
>>  5 files changed, 125 insertions(+), 105 deletions(-)
>> base-commit: e67572cd2204894179d89bd7b984072f19313b03
> 

--
Chuck Lever
Zhu Yanjun April 30, 2024, 1:58 p.m. UTC | #3
On 30.04.24 15:42, Chuck Lever III wrote:
>
>> On Apr 30, 2024, at 3:26 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>
>> On 29.04.24 17:25, cel@kernel.org wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>> Avoid getting work queue splats in the system journal by moving
>>> client-side RPC/RDMA transport tear-down into a background process.
>>> I've done some testing of this series, now looking for review
>>> comments.
>> How to make tests with nfs && rdma? Can you provide some steps or tools?
> We are building NFS tests into kdevops:
>
>     https://github.com/linux-kdevops/kdevops.git
>
> and there is a config option to use soft iWARP instead of TCP.

Thanks a lot. It is interesting. Have you made tests with RXE instead of 
iWARP?

If yes, does nfs work well with RXE? I am just curious with nfs && RXE.

Normally nfs works with TCP. Now nfs will use RDMA instead of TCP.

The popular RDMA implementation is RoCEv2 which is based on UDP protocol.

So I am curious if NFS can work well with RXE (RoCEv2 emulation driver) 
or not.

If the user wants to use nfs in his production hosts, it is possible 
that nfs will work with RoCEv2 (UDP).

Best Regards,

Zhu Yanjun

>
> kdevops includes workflows for fstests, Mora's nfstest, the
> git regression suite, and ltp, all of which we use regularly
> to test the Linux NFS client and server implementations.
>
>
>> I am interested in nfs && rdma.
>>
>> Thanks,
>> Zhu Yanjun
>>
>>> Chuck Lever (4):
>>>    xprtrdma: Remove temp allocation of rpcrdma_rep objects
>>>    xprtrdma: Clean up synopsis of frwr_mr_unmap()
>>>    xprtrdma: Delay releasing connection hardware resources
>>>    xprtrdma: Move MRs to struct rpcrdma_ep
>>>   net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
>>>   net/sunrpc/xprtrdma/transport.c |  20 +++-
>>>   net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
>>>   net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
>>>   5 files changed, 125 insertions(+), 105 deletions(-)
>>> base-commit: e67572cd2204894179d89bd7b984072f19313b03
> --
> Chuck Lever
>
>
Chuck Lever April 30, 2024, 2:13 p.m. UTC | #4
> On Apr 30, 2024, at 9:58 AM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
> 
> 
> On 30.04.24 15:42, Chuck Lever III wrote:
>> 
>>> On Apr 30, 2024, at 3:26 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>> 
>>> On 29.04.24 17:25, cel@kernel.org wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>> Avoid getting work queue splats in the system journal by moving
>>>> client-side RPC/RDMA transport tear-down into a background process.
>>>> I've done some testing of this series, now looking for review
>>>> comments.
>>> How to make tests with nfs && rdma? Can you provide some steps or tools?
>> We are building NFS tests into kdevops:
>> 
>>    https://github.com/linux-kdevops/kdevops.git
>> 
>> and there is a config option to use soft iWARP instead of TCP.
> 
> Thanks a lot. It is interesting. Have you made tests with RXE instead of iWARP?
> 
> If yes, does nfs work well with RXE? I am just curious with nfs && RXE.
> 
> Normally nfs works with TCP. Now nfs will use RDMA instead of TCP.
> 
> The popular RDMA implementation is RoCEv2 which is based on UDP protocol.
> 
> So I am curious if NFS can work well with RXE (RoCEv2 emulation driver) or not.
> 
> If the user wants to use nfs in his production hosts, it is possible that nfs will work with RoCEv2 (UDP).

Yes, NFS/RDMA works with rxe and even with rxe mixed with
hardware RoCE. Someone else will have to step in and say
whether it works "well" since I don't use rxe, only CX-5
and newer on 100GbE.

Generally we use siw because our testing environment varies
between all systems on a single local network or hypervisor,
all the way up to NFS/RDMA on VPN and WAN. The rxe driver
doesn't support operation over tunnels, currently.

It is possible to add rxe as a second option in kdevops,
but siw has worked for our purposes so far, and the NFS
test matrix is already enormous.


> Best Regards,
> 
> Zhu Yanjun
> 
>> kdevops includes workflows for fstests, Mora's nfstest, the
>> git regression suite, and ltp, all of which we use regularly
>> to test the Linux NFS client and server implementations.
>> 
>> 
>>> I am interested in nfs && rdma.
>>> 
>>> Thanks,
>>> Zhu Yanjun
>>> 
>>>> Chuck Lever (4):
>>>>   xprtrdma: Remove temp allocation of rpcrdma_rep objects
>>>>   xprtrdma: Clean up synopsis of frwr_mr_unmap()
>>>>   xprtrdma: Delay releasing connection hardware resources
>>>>   xprtrdma: Move MRs to struct rpcrdma_ep
>>>>  net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
>>>>  net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
>>>>  net/sunrpc/xprtrdma/transport.c |  20 +++-
>>>>  net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
>>>>  net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
>>>>  5 files changed, 125 insertions(+), 105 deletions(-)
>>>> base-commit: e67572cd2204894179d89bd7b984072f19313b03
>> --
>> Chuck Lever
>> 
>> 
> -- 
> Best Regards,
> Yanjun.Zhu


--
Chuck Lever
Zhu Yanjun April 30, 2024, 2:45 p.m. UTC | #5
On 30.04.24 16:13, Chuck Lever III wrote:
> 
> 
>> On Apr 30, 2024, at 9:58 AM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>
>>
>> On 30.04.24 15:42, Chuck Lever III wrote:
>>>
>>>> On Apr 30, 2024, at 3:26 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>>>
>>>> On 29.04.24 17:25, cel@kernel.org wrote:
>>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>> Avoid getting work queue splats in the system journal by moving
>>>>> client-side RPC/RDMA transport tear-down into a background process.
>>>>> I've done some testing of this series, now looking for review
>>>>> comments.
>>>> How to make tests with nfs && rdma? Can you provide some steps or tools?
>>> We are building NFS tests into kdevops:
>>>
>>>     https://github.com/linux-kdevops/kdevops.git
>>>
>>> and there is a config option to use soft iWARP instead of TCP.
>>
>> Thanks a lot. It is interesting. Have you made tests with RXE instead of iWARP?
>>
>> If yes, does nfs work well with RXE? I am just curious with nfs && RXE.
>>
>> Normally nfs works with TCP. Now nfs will use RDMA instead of TCP.
>>
>> The popular RDMA implementation is RoCEv2 which is based on UDP protocol.
>>
>> So I am curious if NFS can work well with RXE (RoCEv2 emulation driver) or not.
>>
>> If the user wants to use nfs in his production hosts, it is possible that nfs will work with RoCEv2 (UDP).
> 
> Yes, NFS/RDMA works with rxe and even with rxe mixed with
> hardware RoCE. Someone else will have to step in and say
> whether it works "well" since I don't use rxe, only CX-5
> and newer on 100GbE.
> 
> Generally we use siw because our testing environment varies
> between all systems on a single local network or hypervisor,
> all the way up to NFS/RDMA on VPN and WAN. The rxe driver
> doesn't support operation over tunnels, currently.

Thanks a lot. "The rxe driver doesn't support operation over tunnels, 
currently." Do you mean that rxe can not work well with tun/tap device?

> 
> It is possible to add rxe as a second option in kdevops,
> but siw has worked for our purposes so far, and the NFS
> test matrix is already enormous.

Thanks. If rxe can be as a second option in kdevops, I will make tests 
with kdevops to check rxe work well or not in the future kernel version.

Best Regards,
Zhu Yanjun

> 
> 
>> Best Regards,
>>
>> Zhu Yanjun
>>
>>> kdevops includes workflows for fstests, Mora's nfstest, the
>>> git regression suite, and ltp, all of which we use regularly
>>> to test the Linux NFS client and server implementations.
>>>
>>>
>>>> I am interested in nfs && rdma.
>>>>
>>>> Thanks,
>>>> Zhu Yanjun
>>>>
>>>>> Chuck Lever (4):
>>>>>    xprtrdma: Remove temp allocation of rpcrdma_rep objects
>>>>>    xprtrdma: Clean up synopsis of frwr_mr_unmap()
>>>>>    xprtrdma: Delay releasing connection hardware resources
>>>>>    xprtrdma: Move MRs to struct rpcrdma_ep
>>>>>   net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
>>>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
>>>>>   net/sunrpc/xprtrdma/transport.c |  20 +++-
>>>>>   net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
>>>>>   net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
>>>>>   5 files changed, 125 insertions(+), 105 deletions(-)
>>>>> base-commit: e67572cd2204894179d89bd7b984072f19313b03
>>> --
>>> Chuck Lever
>>>
>>>
>> -- 
>> Best Regards,
>> Yanjun.Zhu
> 
> 
> --
> Chuck Lever
> 
>
Chuck Lever April 30, 2024, 2:52 p.m. UTC | #6
> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> 
> On 30.04.24 16:13, Chuck Lever III wrote:
>>> On Apr 30, 2024, at 9:58 AM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>> 
>>> 
>>> On 30.04.24 15:42, Chuck Lever III wrote:
>>>> 
>>>>> On Apr 30, 2024, at 3:26 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>>>> 
>>>>> On 29.04.24 17:25, cel@kernel.org wrote:
>>>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>>> Avoid getting work queue splats in the system journal by moving
>>>>>> client-side RPC/RDMA transport tear-down into a background process.
>>>>>> I've done some testing of this series, now looking for review
>>>>>> comments.
>>>>> How to make tests with nfs && rdma? Can you provide some steps or tools?
>>>> We are building NFS tests into kdevops:
>>>> 
>>>>    https://github.com/linux-kdevops/kdevops.git
>>>> 
>>>> and there is a config option to use soft iWARP instead of TCP.
>>> 
>>> Thanks a lot. It is interesting. Have you made tests with RXE instead of iWARP?
>>> 
>>> If yes, does nfs work well with RXE? I am just curious with nfs && RXE.
>>> 
>>> Normally nfs works with TCP. Now nfs will use RDMA instead of TCP.
>>> 
>>> The popular RDMA implementation is RoCEv2 which is based on UDP protocol.
>>> 
>>> So I am curious if NFS can work well with RXE (RoCEv2 emulation driver) or not.
>>> 
>>> If the user wants to use nfs in his production hosts, it is possible that nfs will work with RoCEv2 (UDP).
>> Yes, NFS/RDMA works with rxe and even with rxe mixed with
>> hardware RoCE. Someone else will have to step in and say
>> whether it works "well" since I don't use rxe, only CX-5
>> and newer on 100GbE.
>> Generally we use siw because our testing environment varies
>> between all systems on a single local network or hypervisor,
>> all the way up to NFS/RDMA on VPN and WAN. The rxe driver
>> doesn't support operation over tunnels, currently.
> 
> Thanks a lot. "The rxe driver doesn't support operation over tunnels, currently." Do you mean that rxe can not work well with tun/tap device?

No, rxe cannot be configured to use tunnel devices, AFAIK.


>> It is possible to add rxe as a second option in kdevops,
>> but siw has worked for our purposes so far, and the NFS
>> test matrix is already enormous.
> 
> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.

No new tests are necessary. The only thing missing right
now is the ability to set up rxe devices on all the test
systems.


> Best Regards,
> Zhu Yanjun
> 
>>> Best Regards,
>>> 
>>> Zhu Yanjun
>>> 
>>>> kdevops includes workflows for fstests, Mora's nfstest, the
>>>> git regression suite, and ltp, all of which we use regularly
>>>> to test the Linux NFS client and server implementations.
>>>> 
>>>> 
>>>>> I am interested in nfs && rdma.
>>>>> 
>>>>> Thanks,
>>>>> Zhu Yanjun
>>>>> 
>>>>>> Chuck Lever (4):
>>>>>>   xprtrdma: Remove temp allocation of rpcrdma_rep objects
>>>>>>   xprtrdma: Clean up synopsis of frwr_mr_unmap()
>>>>>>   xprtrdma: Delay releasing connection hardware resources
>>>>>>   xprtrdma: Move MRs to struct rpcrdma_ep
>>>>>>  net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
>>>>>>  net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
>>>>>>  net/sunrpc/xprtrdma/transport.c |  20 +++-
>>>>>>  net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
>>>>>>  net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
>>>>>>  5 files changed, 125 insertions(+), 105 deletions(-)
>>>>>> base-commit: e67572cd2204894179d89bd7b984072f19313b03
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>>> -- 
>>> Best Regards,
>>> Yanjun.Zhu
>> --
>> Chuck Lever


--
Chuck Lever
Zhu Yanjun April 30, 2024, 2:57 p.m. UTC | #7
On 30.04.24 16:52, Chuck Lever III wrote:
>
>> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>
>> On 30.04.24 16:13, Chuck Lever III wrote:
>>>> On Apr 30, 2024, at 9:58 AM, Zhu Yanjun <yanjun.zhu@linux.dev> wrote:
>>>>
>>>>
>>>> On 30.04.24 15:42, Chuck Lever III wrote:
>>>>>> On Apr 30, 2024, at 3:26 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>>>>>
>>>>>> On 29.04.24 17:25, cel@kernel.org wrote:
>>>>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>>>> Avoid getting work queue splats in the system journal by moving
>>>>>>> client-side RPC/RDMA transport tear-down into a background process.
>>>>>>> I've done some testing of this series, now looking for review
>>>>>>> comments.
>>>>>> How to make tests with nfs && rdma? Can you provide some steps or tools?
>>>>> We are building NFS tests into kdevops:
>>>>>
>>>>>     https://github.com/linux-kdevops/kdevops.git
>>>>>
>>>>> and there is a config option to use soft iWARP instead of TCP.
>>>> Thanks a lot. It is interesting. Have you made tests with RXE instead of iWARP?
>>>>
>>>> If yes, does nfs work well with RXE? I am just curious with nfs && RXE.
>>>>
>>>> Normally nfs works with TCP. Now nfs will use RDMA instead of TCP.
>>>>
>>>> The popular RDMA implementation is RoCEv2 which is based on UDP protocol.
>>>>
>>>> So I am curious if NFS can work well with RXE (RoCEv2 emulation driver) or not.
>>>>
>>>> If the user wants to use nfs in his production hosts, it is possible that nfs will work with RoCEv2 (UDP).
>>> Yes, NFS/RDMA works with rxe and even with rxe mixed with
>>> hardware RoCE. Someone else will have to step in and say
>>> whether it works "well" since I don't use rxe, only CX-5
>>> and newer on 100GbE.
>>> Generally we use siw because our testing environment varies
>>> between all systems on a single local network or hypervisor,
>>> all the way up to NFS/RDMA on VPN and WAN. The rxe driver
>>> doesn't support operation over tunnels, currently.
>> Thanks a lot. "The rxe driver doesn't support operation over tunnels, currently." Do you mean that rxe can not work well with tun/tap device?
> No, rxe cannot be configured to use tunnel devices, AFAIK.
>
>
>>> It is possible to add rxe as a second option in kdevops,
>>> but siw has worked for our purposes so far, and the NFS
>>> test matrix is already enormous.
>> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.
> No new tests are necessary. The only thing missing right
> now is the ability to set up rxe devices on all the test
> systems.

Got it. Thanks.

Zhu Yanjun

>
>
>> Best Regards,
>> Zhu Yanjun
>>
>>>> Best Regards,
>>>>
>>>> Zhu Yanjun
>>>>
>>>>> kdevops includes workflows for fstests, Mora's nfstest, the
>>>>> git regression suite, and ltp, all of which we use regularly
>>>>> to test the Linux NFS client and server implementations.
>>>>>
>>>>>
>>>>>> I am interested in nfs && rdma.
>>>>>>
>>>>>> Thanks,
>>>>>> Zhu Yanjun
>>>>>>
>>>>>>> Chuck Lever (4):
>>>>>>>    xprtrdma: Remove temp allocation of rpcrdma_rep objects
>>>>>>>    xprtrdma: Clean up synopsis of frwr_mr_unmap()
>>>>>>>    xprtrdma: Delay releasing connection hardware resources
>>>>>>>    xprtrdma: Move MRs to struct rpcrdma_ep
>>>>>>>   net/sunrpc/xprtrdma/frwr_ops.c  |  13 ++-
>>>>>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |   3 +-
>>>>>>>   net/sunrpc/xprtrdma/transport.c |  20 +++-
>>>>>>>   net/sunrpc/xprtrdma/verbs.c     | 173 ++++++++++++++++----------------
>>>>>>>   net/sunrpc/xprtrdma/xprt_rdma.h |  21 ++--
>>>>>>>   5 files changed, 125 insertions(+), 105 deletions(-)
>>>>>>> base-commit: e67572cd2204894179d89bd7b984072f19313b03
>>>>> --
>>>>> Chuck Lever
>>>>>
>>>>>
>>>> -- 
>>>> Best Regards,
>>>> Yanjun.Zhu
>>> --
>>> Chuck Lever
>
> --
> Chuck Lever
>
>
Chuck Lever June 2, 2024, 3:40 p.m. UTC | #8
> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> 
> On 30.04.24 16:13, Chuck Lever III wrote:
>> It is possible to add rxe as a second option in kdevops,
>> but siw has worked for our purposes so far, and the NFS
>> test matrix is already enormous.
> 
> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.

As per our recent discussion, I have added rxe as a second
software RDMA option in kdevops. Proof of concept:

  https://github.com/chucklever/kdevops/tree/add-rxe-support

But basic rping testing is not working (with 6.10-rc1 kernels)
in this set-up. It's missing something...

--
Chuck Lever
Zhu Yanjun June 2, 2024, 6:14 p.m. UTC | #9
On Sun, Jun 2, 2024 at 5:40 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
> > On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> >
> > On 30.04.24 16:13, Chuck Lever III wrote:
> >> It is possible to add rxe as a second option in kdevops,
> >> but siw has worked for our purposes so far, and the NFS
> >> test matrix is already enormous.
> >
> > Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.
>
> As per our recent discussion, I have added rxe as a second
> software RDMA option in kdevops. Proof of concept:

Thanks a lot. I am very glad to know that rxe is treated as a second
software RDMA option in kdeops.
And I also checked the commit related with this feature. It is very
complicated and huge. I hope rxe can work well in kdeops.
So I can also use kdeops to verify rxe and rdma subsystems.  Thanks a
lot your efforts.

>
>   https://github.com/chucklever/kdevops/tree/add-rxe-support
>
> But basic rping testing is not working (with 6.10-rc1 kernels)
> in this set-up. It's missing something...

Just now I made tests with the latest rdma-core (rping is included in
rdma-core) and 6.10-rc1 kernels. rping can work well.

Normally rping works as a basic tool to verify if rxe works well or
not.  If rping can not work well, normally I will do the followings:
1. rping -s -a 127.0.0.1
    rping -c -a 127.0.0.1 -C 3 -d -v
    This will verify whether rxe is configured correctly or not.
2. ping -c 3 server_ip on client host.
    This will verify whether the client host can connect to the server
host or not.
3. rping -s -a server_ip
    rping -c -a server_ip -C 3 -d -v
    1) shutdown firewall
    2) tcpdump -ni xxxx to capture udp packets
Normally the above steps can find out the errors in rxe client/server.
Hope the above can help to find out the errors.

Zhu Yanjun

>
> --
> Chuck Lever
>
>
Chuck Lever June 3, 2024, 3:59 p.m. UTC | #10
> On Jun 2, 2024, at 2:14 PM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> 
> On Sun, Jun 2, 2024 at 5:40 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>>> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>> 
>>> On 30.04.24 16:13, Chuck Lever III wrote:
>>>> It is possible to add rxe as a second option in kdevops,
>>>> but siw has worked for our purposes so far, and the NFS
>>>> test matrix is already enormous.
>>> 
>>> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.
>> 
>> As per our recent discussion, I have added rxe as a second
>> software RDMA option in kdevops. Proof of concept:
> 
> Thanks a lot. I am very glad to know that rxe is treated as a second
> software RDMA option in kdeops.
> And I also checked the commit related with this feature. It is very
> complicated and huge.

I split this into four smaller patches, HTH.


> I hope rxe can work well in kdeops.
> So I can also use kdeops to verify rxe and rdma subsystems.  Thanks a
> lot your efforts.
> 
>> 
>>  https://github.com/chucklever/kdevops/tree/add-rxe-support
>> 
>> But basic rping testing is not working (with 6.10-rc1 kernels)
>> in this set-up. It's missing something...
> 
> Just now I made tests with the latest rdma-core (rping is included in
> rdma-core) and 6.10-rc1 kernels. rping can work well.
> 
> Normally rping works as a basic tool to verify if rxe works well or
> not.  If rping can not work well, normally I will do the followings:
> 1. rping -s -a 127.0.0.1
>    rping -c -a 127.0.0.1 -C 3 -d -v
>    This will verify whether rxe is configured correctly or not.

I don't have rxe set up on loopback, so I substituted the host's
configured Ethernet IP.

The tests works on the NFS server, but the rping client hangs
on the NFS client (both running v6.10-rc1).

I rebooted in to the Fedora 39 stock kernel, and the rping tests
pass.

However, when I try to run fstests with NFS/RDMA using rxe, the
client kernel reports a soft CPU lock-up, and top shows this:

    115 root      20   0       0      0      0 R  99.3   0.0   1:03.50 kworker/u8:5+rxe_wq

So I think this is enough to show that the Ansible parts of this
change are working as expected. I can push this to kdevops now
if there are no objections, and someone (maybe you, maybe me) can
sort out the rxe specific issues later.


> 2. ping -c 3 server_ip on client host.
>    This will verify whether the client host can connect to the server
> host or not.
> 3. rping -s -a server_ip
>    rping -c -a server_ip -C 3 -d -v
>    1) shutdown firewall
>    2) tcpdump -ni xxxx to capture udp packets
> Normally the above steps can find out the errors in rxe client/server.
> Hope the above can help to find out the errors.
> 
> Zhu Yanjun
> 
>> 
>> --
>> Chuck Lever
>> 
>> 

--
Chuck Lever
Zhu Yanjun June 3, 2024, 4:54 p.m. UTC | #11
On Mon, Jun 3, 2024 at 5:59 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Jun 2, 2024, at 2:14 PM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> >
> > On Sun, Jun 2, 2024 at 5:40 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> >>>
> >>> On 30.04.24 16:13, Chuck Lever III wrote:
> >>>> It is possible to add rxe as a second option in kdevops,
> >>>> but siw has worked for our purposes so far, and the NFS
> >>>> test matrix is already enormous.
> >>>
> >>> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.
> >>
> >> As per our recent discussion, I have added rxe as a second
> >> software RDMA option in kdevops. Proof of concept:
> >
> > Thanks a lot. I am very glad to know that rxe is treated as a second
> > software RDMA option in kdeops.
> > And I also checked the commit related with this feature. It is very
> > complicated and huge.
>
> I split this into four smaller patches, HTH.
>
>
> > I hope rxe can work well in kdeops.
> > So I can also use kdeops to verify rxe and rdma subsystems.  Thanks a
> > lot your efforts.
> >
> >>
> >>  https://github.com/chucklever/kdevops/tree/add-rxe-support
> >>
> >> But basic rping testing is not working (with 6.10-rc1 kernels)
> >> in this set-up. It's missing something...
> >
> > Just now I made tests with the latest rdma-core (rping is included in
> > rdma-core) and 6.10-rc1 kernels. rping can work well.
> >
> > Normally rping works as a basic tool to verify if rxe works well or
> > not.  If rping can not work well, normally I will do the followings:
> > 1. rping -s -a 127.0.0.1
> >    rping -c -a 127.0.0.1 -C 3 -d -v
> >    This will verify whether rxe is configured correctly or not.
>
> I don't have rxe set up on loopback, so I substituted the host's
> configured Ethernet IP.
>
> The tests works on the NFS server, but the rping client hangs
> on the NFS client (both running v6.10-rc1).
>
> I rebooted in to the Fedora 39 stock kernel, and the rping tests
> pass.
>
> However, when I try to run fstests with NFS/RDMA using rxe, the
> client kernel reports a soft CPU lock-up, and top shows this:
>
>     115 root      20   0       0      0      0 R  99.3   0.0   1:03.50 kworker/u8:5+rxe_wq

rxe_wq is introduced in the commit 9b4b7c1f9f54 "RDMA/rxe: Add
workqueue support for rxe tasks".
And this commit is merged into kernel v6.4-rc2-1-g9b4b7c1f9f54.

And the Fedora 39 stock kernel is kernel 6.5. So maybe some commits
between 6.5 and 6.10 introduce this problem.

>
> So I think this is enough to show that the Ansible parts of this
> change are working as expected. I can push this to kdevops now
> if there are no objections, and someone (maybe you, maybe me) can
> sort out the rxe specific issues later.

Thanks. After I can reproduce this problem in my local host, I am very
glad to delve into this problem. Perhaps it will take me a long time
since I do not have a good host to deploy kdeops.

To be honest, perhaps "git bisec" can find the commit that introduce
this problem. If you can find the commit, we can fix this problem very
quickly^_^

Thanks,
Zhu Yanjun

>
>
> > 2. ping -c 3 server_ip on client host.
> >    This will verify whether the client host can connect to the server
> > host or not.
> > 3. rping -s -a server_ip
> >    rping -c -a server_ip -C 3 -d -v
> >    1) shutdown firewall
> >    2) tcpdump -ni xxxx to capture udp packets
> > Normally the above steps can find out the errors in rxe client/server.
> > Hope the above can help to find out the errors.
> >
> > Zhu Yanjun
> >
> >>
> >> --
> >> Chuck Lever
> >>
> >>
>
> --
> Chuck Lever
>
>
Chuck Lever June 3, 2024, 5:06 p.m. UTC | #12
> On Jun 3, 2024, at 12:54 PM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
> 
> On Mon, Jun 3, 2024 at 5:59 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Jun 2, 2024, at 2:14 PM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>> 
>>> On Sun, Jun 2, 2024 at 5:40 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>> 
>>>> 
>>>>> On Apr 30, 2024, at 10:45 AM, Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>>>>> 
>>>>> On 30.04.24 16:13, Chuck Lever III wrote:
>>>>>> It is possible to add rxe as a second option in kdevops,
>>>>>> but siw has worked for our purposes so far, and the NFS
>>>>>> test matrix is already enormous.
>>>>> 
>>>>> Thanks. If rxe can be as a second option in kdevops, I will make tests with kdevops to check rxe work well or not in the future kernel version.
>>>> 
>>>> As per our recent discussion, I have added rxe as a second
>>>> software RDMA option in kdevops. Proof of concept:
>>> 
>>> Thanks a lot. I am very glad to know that rxe is treated as a second
>>> software RDMA option in kdeops.
>>> And I also checked the commit related with this feature. It is very
>>> complicated and huge.
>> 
>> I split this into four smaller patches, HTH.
>> 
>> 
>>> I hope rxe can work well in kdeops.
>>> So I can also use kdeops to verify rxe and rdma subsystems.  Thanks a
>>> lot your efforts.
>>> 
>>>> 
>>>> https://github.com/chucklever/kdevops/tree/add-rxe-support
>>>> 
>>>> But basic rping testing is not working (with 6.10-rc1 kernels)
>>>> in this set-up. It's missing something...
>>> 
>>> Just now I made tests with the latest rdma-core (rping is included in
>>> rdma-core) and 6.10-rc1 kernels. rping can work well.
>>> 
>>> Normally rping works as a basic tool to verify if rxe works well or
>>> not.  If rping can not work well, normally I will do the followings:
>>> 1. rping -s -a 127.0.0.1
>>>   rping -c -a 127.0.0.1 -C 3 -d -v
>>>   This will verify whether rxe is configured correctly or not.
>> 
>> I don't have rxe set up on loopback, so I substituted the host's
>> configured Ethernet IP.
>> 
>> The tests works on the NFS server, but the rping client hangs
>> on the NFS client (both running v6.10-rc1).
>> 
>> I rebooted in to the Fedora 39 stock kernel, and the rping tests
>> pass.
>> 
>> However, when I try to run fstests with NFS/RDMA using rxe, the
>> client kernel reports a soft CPU lock-up, and top shows this:
>> 
>>    115 root      20   0       0      0      0 R  99.3   0.0   1:03.50 kworker/u8:5+rxe_wq
> 
> rxe_wq is introduced in the commit 9b4b7c1f9f54 "RDMA/rxe: Add
> workqueue support for rxe tasks".
> And this commit is merged into kernel v6.4-rc2-1-g9b4b7c1f9f54.
> 
> And the Fedora 39 stock kernel is kernel 6.5. So maybe some commits
> between 6.5 and 6.10 introduce this problem.

I couldn't get 6.10-rc1 working at all. This failure occurred
with the stock Fedora 39 kernel and fstests with NFS v4.2 on
RDMA.


>> So I think this is enough to show that the Ansible parts of this
>> change are working as expected. I can push this to kdevops now
>> if there are no objections, and someone (maybe you, maybe me) can
>> sort out the rxe specific issues later.
> 
> Thanks. After I can reproduce this problem in my local host, I am very
> glad to delve into this problem. Perhaps it will take me a long time
> since I do not have a good host to deploy kdevops.

kdevops works on laptops too. The limiting factor seems to be
memory for libvirt guests. Only two guests are needed for this
test.


> To be honest, perhaps "git bisec" can find the commit that introduce
> this problem. If you can find the commit, we can fix this problem very
> quickly^_^

Since this is the first time I've ever used rxe, I don't have a
"good" commit to start from.


> Thanks,
> Zhu Yanjun
> 
>> 
>> 
>>> 2. ping -c 3 server_ip on client host.
>>>   This will verify whether the client host can connect to the server
>>> host or not.
>>> 3. rping -s -a server_ip
>>>   rping -c -a server_ip -C 3 -d -v
>>>   1) shutdown firewall
>>>   2) tcpdump -ni xxxx to capture udp packets
>>> Normally the above steps can find out the errors in rxe client/server.
>>> Hope the above can help to find out the errors.
>>> 
>>> Zhu Yanjun
>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>>>> 
>>>> 
>> 
>> --
>> Chuck Lever


--
Chuck Lever