mbox series

[00/20] NFSD support for multiple RPC/RDMA chunks

Message ID 160373843299.1886.12604782813896379719.stgit@klimt.1015granger.net (mailing list archive)
Headers show
Series NFSD support for multiple RPC/RDMA chunks | expand

Message

Chuck Lever Oct. 26, 2020, 6:53 p.m. UTC
This series implements support for multiple RPC/RDMA chunks per RPC
transaction. This is one of the few remaining generalities that the
Linux NFS/RDMA server implementation lacks.

There is currently one known NFS/RDMA client implementation that can
send multiple chunks per RPC, and that is Solaris. Multiple chunks
are rare enough that the Linux NFS/RDMA implementation has been
successful without this support for many years.

Along with multiple chunk support, this series adds the following
benefits:

- More robust input sanitization of RPC/RDMA headers
- An internal representation of chunks that is agnostic to their
  wire format

The cost is a little additional complexity and some extra memory
allocations when handling non-empty chunk lists. Most of these
allocations can be optimized away if we find they are a problem.

---

Chuck Lever (20):
      SUNRPC: Adjust synopsis of xdr_buf_subsegment()
      svcrdma: Const-ify the xdr_buf arguments
      svcrdma: Refactor the RDMA Write path
      SUNRPC: Rename svc_encode_read_payload()
      NFSD: Invoke svc_encode_result_payload() in "read" NFSD encoders
      svcrdma: Post RDMA Writes while XDR encoding replies
      svcrdma: Clean up svc_rdma_encode_reply_chunk()
      svcrdma: Add a "parsed chunk list" data structure
      svcrdma: Use parsed chunk lists to derive the inv_rkey
      svcrdma: Use parsed chunk lists to detect reverse direction replies
      svcrdma: Use parsed chunk lists to construct RDMA Writes
      svcrdma: Use parsed chunk lists to encode Reply transport headers
      svcrdma: Support multiple write chunks when pulling up
      svcrdma: Support multiple Write chunks in svc_rdma_map_reply_msg()
      svcrdma: Support multiple Write chunks in svc_rdma_send_reply_chunk
      svcrdma: Remove chunk list pointers
      svcrdma: Clean up chunk tracepoints
      svcrdma: Rename info::ri_chunklen
      svcrdma: Use the new parsed chunk list when pulling Read chunks
      svcrdma: support multiple Read chunks per RPC


 fs/nfsd/nfs3xdr.c                          |   4 +
 fs/nfsd/nfs4xdr.c                          |   5 +-
 fs/nfsd/nfsxdr.c                           |   4 +
 include/linux/sunrpc/svc.h                 |   6 +-
 include/linux/sunrpc/svc_rdma.h            |  36 +-
 include/linux/sunrpc/svc_rdma_pcl.h        | 128 +++++
 include/linux/sunrpc/svc_xprt.h            |   4 +-
 include/trace/events/rpcrdma.h             | 143 +++--
 net/sunrpc/svc.c                           |  11 +-
 net/sunrpc/svcsock.c                       |   8 +-
 net/sunrpc/xprtrdma/Makefile               |   2 +-
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  14 +-
 net/sunrpc/xprtrdma/svc_rdma_pcl.c         | 306 +++++++++++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    | 314 ++++++-----
 net/sunrpc/xprtrdma/svc_rdma_rw.c          | 598 +++++++++++++++------
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      | 561 ++++++++++---------
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   2 +-
 17 files changed, 1488 insertions(+), 658 deletions(-)
 create mode 100644 include/linux/sunrpc/svc_rdma_pcl.h
 create mode 100644 net/sunrpc/xprtrdma/svc_rdma_pcl.c

--
Chuck Lever

Comments

Leon Romanovsky Oct. 27, 2020, 6:08 a.m. UTC | #1
On Mon, Oct 26, 2020 at 02:53:53PM -0400, Chuck Lever wrote:
> This series implements support for multiple RPC/RDMA chunks per RPC
> transaction. This is one of the few remaining generalities that the
> Linux NFS/RDMA server implementation lacks.
>
> There is currently one known NFS/RDMA client implementation that can
> send multiple chunks per RPC, and that is Solaris. Multiple chunks
> are rare enough that the Linux NFS/RDMA implementation has been
> successful without this support for many years.

So why do we need it? Solaris is dead, and like you wrote Linux systems
work without this feature just fine, what are the benefits? Who will use it?

Thanks
Chuck Lever Oct. 27, 2020, 1:24 p.m. UTC | #2
Hi Leon-

> On Oct 27, 2020, at 2:08 AM, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Mon, Oct 26, 2020 at 02:53:53PM -0400, Chuck Lever wrote:
>> This series implements support for multiple RPC/RDMA chunks per RPC
>> transaction. This is one of the few remaining generalities that the
>> Linux NFS/RDMA server implementation lacks.
>> 
>> There is currently one known NFS/RDMA client implementation that can
>> send multiple chunks per RPC, and that is Solaris. Multiple chunks
>> are rare enough that the Linux NFS/RDMA implementation has been
>> successful without this support for many years.
> 
> So why do we need it? Solaris is dead, and like you wrote Linux systems
> work without this feature just fine, what are the benefits? Who will use it?

The Linux NFS implementation is living. We can add the ability
to provision multiple chunks per RPC to the Linux NFS client at
any time.

Likewise any actively developed NFS/RDMA implementation can add
this feature. The RPC/RDMA version 1 protocol does not have the
ability to communicate the maximum number of chunks the server
will accept per RPC.

Other server implementations do support multiple chunks per RPC.
The Linux NFS/RDMA server implementation has always been incomplete
in this regard.

And the Linux NFS server implementation (the non-transport specific
part) already supports multiple data payloads per NFSv4 COMPOUND.


Restoring a little more of the cover letter:

>> Along with multiple chunk support, this series adds the following
>> benefits:
>> 
>> - More robust input sanitization of RPC/RDMA headers
>> - An internal representation of chunks that is agnostic to their
>>  wire format

The Linux NFS/RDMA server implementation does need to have better
input sanitization.

And there is a version 2 of RPC/RDMA under active development:

https://datatracker.ietf.org/doc/draft-ietf-nfsv4-rpcrdma-version-two/

Having some protocol version agnosticism in our transport might
be necessary eventually.

--
Chuck Lever
J. Bruce Fields Oct. 27, 2020, 5:25 p.m. UTC | #3
On Tue, Oct 27, 2020 at 09:24:54AM -0400, Chuck Lever wrote:
> Hi Leon-
> 
> > On Oct 27, 2020, at 2:08 AM, Leon Romanovsky <leon@kernel.org> wrote:
> > 
> > On Mon, Oct 26, 2020 at 02:53:53PM -0400, Chuck Lever wrote:
> >> This series implements support for multiple RPC/RDMA chunks per RPC
> >> transaction. This is one of the few remaining generalities that the
> >> Linux NFS/RDMA server implementation lacks.
> >> 
> >> There is currently one known NFS/RDMA client implementation that can
> >> send multiple chunks per RPC, and that is Solaris. Multiple chunks
> >> are rare enough that the Linux NFS/RDMA implementation has been
> >> successful without this support for many years.
> > 
> > So why do we need it? Solaris is dead, and like you wrote Linux systems
> > work without this feature just fine, what are the benefits? Who will use it?
> 
> The Linux NFS implementation is living. We can add the ability
> to provision multiple chunks per RPC to the Linux NFS client at
> any time.
> 
> Likewise any actively developed NFS/RDMA implementation can add
> this feature. The RPC/RDMA version 1 protocol does not have the
> ability to communicate the maximum number of chunks the server
> will accept per RPC.
> 
> Other server implementations do support multiple chunks per RPC.
> The Linux NFS/RDMA server implementation has always been incomplete
> in this regard.

Can the client can detect the server's lack of support and fall back, or
does the server's incompleteness violate the RFC in some way that can
actually cause a failure to interoperate?

--b.

> And the Linux NFS server implementation (the non-transport specific
> part) already supports multiple data payloads per NFSv4 COMPOUND.
> 
> 
> Restoring a little more of the cover letter:
> 
> >> Along with multiple chunk support, this series adds the following
> >> benefits:
> >> 
> >> - More robust input sanitization of RPC/RDMA headers
> >> - An internal representation of chunks that is agnostic to their
> >>  wire format
> 
> The Linux NFS/RDMA server implementation does need to have better
> input sanitization.
> 
> And there is a version 2 of RPC/RDMA under active development:
> 
> https://datatracker.ietf.org/doc/draft-ietf-nfsv4-rpcrdma-version-two/
> 
> Having some protocol version agnosticism in our transport might
> be necessary eventually.
> 
> --
> Chuck Lever
> 
>
Chuck Lever Oct. 27, 2020, 5:29 p.m. UTC | #4
> On Oct 27, 2020, at 1:25 PM, bfields@fieldses.org wrote:
> 
> On Tue, Oct 27, 2020 at 09:24:54AM -0400, Chuck Lever wrote:
>> Hi Leon-
>> 
>>> On Oct 27, 2020, at 2:08 AM, Leon Romanovsky <leon@kernel.org> wrote:
>>> 
>>> On Mon, Oct 26, 2020 at 02:53:53PM -0400, Chuck Lever wrote:
>>>> This series implements support for multiple RPC/RDMA chunks per RPC
>>>> transaction. This is one of the few remaining generalities that the
>>>> Linux NFS/RDMA server implementation lacks.
>>>> 
>>>> There is currently one known NFS/RDMA client implementation that can
>>>> send multiple chunks per RPC, and that is Solaris. Multiple chunks
>>>> are rare enough that the Linux NFS/RDMA implementation has been
>>>> successful without this support for many years.
>>> 
>>> So why do we need it? Solaris is dead, and like you wrote Linux systems
>>> work without this feature just fine, what are the benefits? Who will use it?
>> 
>> The Linux NFS implementation is living. We can add the ability
>> to provision multiple chunks per RPC to the Linux NFS client at
>> any time.
>> 
>> Likewise any actively developed NFS/RDMA implementation can add
>> this feature. The RPC/RDMA version 1 protocol does not have the
>> ability to communicate the maximum number of chunks the server
>> will accept per RPC.
>> 
>> Other server implementations do support multiple chunks per RPC.
>> The Linux NFS/RDMA server implementation has always been incomplete
>> in this regard.
> 
> Can the client can detect the server's lack of support and fall back, or
> does the server's incompleteness violate the RFC in some way that can
> actually cause a failure to interoperate?

The latter. Currently the client has no way to detect that our server
does not comply with RFC 8166, which places no arbitrary limits on
the number of chunks per RPC.

If a client attempts to send more than one chunk the RPC fails (or
worse). RPC/RDMA version 1 does not provide a way to indicate that
the failure was because the client sent too many chunks, so the
client has to terminate the RPC transaction with an error.


>> And the Linux NFS server implementation (the non-transport specific
>> part) already supports multiple data payloads per NFSv4 COMPOUND.
>> 
>> 
>> Restoring a little more of the cover letter:
>> 
>>>> Along with multiple chunk support, this series adds the following
>>>> benefits:
>>>> 
>>>> - More robust input sanitization of RPC/RDMA headers
>>>> - An internal representation of chunks that is agnostic to their
>>>> wire format
>> 
>> The Linux NFS/RDMA server implementation does need to have better
>> input sanitization.
>> 
>> And there is a version 2 of RPC/RDMA under active development:
>> 
>> https://datatracker.ietf.org/doc/draft-ietf-nfsv4-rpcrdma-version-two/
>> 
>> Having some protocol version agnosticism in our transport might
>> be necessary eventually.
>> 
>> --
>> Chuck Lever

--
Chuck Lever
Leon Romanovsky Oct. 28, 2020, 7:16 a.m. UTC | #5
On Tue, Oct 27, 2020 at 09:24:54AM -0400, Chuck Lever wrote:
> Hi Leon-
>
> > On Oct 27, 2020, at 2:08 AM, Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Oct 26, 2020 at 02:53:53PM -0400, Chuck Lever wrote:
> >> This series implements support for multiple RPC/RDMA chunks per RPC
> >> transaction. This is one of the few remaining generalities that the
> >> Linux NFS/RDMA server implementation lacks.
> >>
> >> There is currently one known NFS/RDMA client implementation that can
> >> send multiple chunks per RPC, and that is Solaris. Multiple chunks
> >> are rare enough that the Linux NFS/RDMA implementation has been
> >> successful without this support for many years.
> >
> > So why do we need it? Solaris is dead, and like you wrote Linux systems
> > work without this feature just fine, what are the benefits? Who will use it?
>
> The Linux NFS implementation is living. We can add the ability
> to provision multiple chunks per RPC to the Linux NFS client at
> any time.
>
> Likewise any actively developed NFS/RDMA implementation can add
> this feature. The RPC/RDMA version 1 protocol does not have the
> ability to communicate the maximum number of chunks the server
> will accept per RPC.
>
> Other server implementations do support multiple chunks per RPC.
> The Linux NFS/RDMA server implementation has always been incomplete
> in this regard.
>
> And the Linux NFS server implementation (the non-transport specific
> part) already supports multiple data payloads per NFSv4 COMPOUND.

Thanks, I just got different feeling then I read the cover letter.
You presented it like no one needs this feature.

Thanks
Chuck Lever Oct. 28, 2020, 1:10 p.m. UTC | #6
> On Oct 28, 2020, at 3:16 AM, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Tue, Oct 27, 2020 at 09:24:54AM -0400, Chuck Lever wrote:
>> Hi Leon-
>> 
>>> On Oct 27, 2020, at 2:08 AM, Leon Romanovsky <leon@kernel.org> wrote:
>>> 
>>> On Mon, Oct 26, 2020 at 02:53:53PM -0400, Chuck Lever wrote:
>>>> This series implements support for multiple RPC/RDMA chunks per RPC
>>>> transaction. This is one of the few remaining generalities that the
>>>> Linux NFS/RDMA server implementation lacks.
>>>> 
>>>> There is currently one known NFS/RDMA client implementation that can
>>>> send multiple chunks per RPC, and that is Solaris. Multiple chunks
>>>> are rare enough that the Linux NFS/RDMA implementation has been
>>>> successful without this support for many years.
>>> 
>>> So why do we need it? Solaris is dead, and like you wrote Linux systems
>>> work without this feature just fine, what are the benefits? Who will use it?
>> 
>> The Linux NFS implementation is living. We can add the ability
>> to provision multiple chunks per RPC to the Linux NFS client at
>> any time.
>> 
>> Likewise any actively developed NFS/RDMA implementation can add
>> this feature. The RPC/RDMA version 1 protocol does not have the
>> ability to communicate the maximum number of chunks the server
>> will accept per RPC.
>> 
>> Other server implementations do support multiple chunks per RPC.
>> The Linux NFS/RDMA server implementation has always been incomplete
>> in this regard.
>> 
>> And the Linux NFS server implementation (the non-transport specific
>> part) already supports multiple data payloads per NFSv4 COMPOUND.
> 
> Thanks, I just got different feeling then I read the cover letter.
> You presented it like no one needs this feature.

Understood. I'll incorporate a summary of the content of this thread
in the cover letter for the next version of the series.

--
Chuck Lever