mbox series

[for-6.11,00/29] nfs/nfsd: add support for localio bypass

Message ID 20240607142646.20924-1-snitzer@kernel.org (mailing list archive)
Headers show
Series nfs/nfsd: add support for localio bypass | expand

Message

Mike Snitzer June 7, 2024, 2:26 p.m. UTC
Hi,

This patch series rebases "localio" changes that Hammerspace (and
Primary Data before it) has been carrying since 2014. The reason they
weren't proposed for upstream inclusion until now was the handshake
for whether or not a client and server are local was brittle. Please
see the commit header of "nfs/localio: discontinue network address
based localio setup" (patch 20) for more context.

Aside from rebasing the original changes (patches 1 - 18) from a
5.15.-130-stable kernel, my contribution to this series was to make
the localio handshake more robust. To do so a new LOCALIO protocol
extension has been added to both NFS v3 and v4. It follows the
well-worn pattern established by the ACL protocol extension.

These changes have proven stable against various test scenarios:
1) client and server both on localhost (for both v3 and v4.2)
2) various permutations of client and server support enablement for
   both local and remote client and server.
3) client on host, server within a container (for both v3 and v4.2)

I've preserved all established author and Signed-off-by attribution
despite Andy, Peng and Jeff no longer working for Primary Data (or
Hammerspace). I've confirmed with Trond that its best to keep it all
despite those email addresses no longer being active. My Signed-off-by
and that of reviewers and maintainer(s) to follow will build on the
established development provenance.

I also made sure to preserve the original work done by others (rather
than fold changes that I add to this work, to avoid tainting the long
established development and sequence of changes).

My container testing was done in terms of podman managed containers.
I'd appreciate additional review relative to network namespaces.
fs/nfsd/localio.c:nfsd_local_fakerqst_create() in particular is simply
using the client's network namespace with rpc_net_ns(rpc_clnt). I have
an extra patch that updates nfsd_open_local_fh()'s first argument to
be the server's 'struct net' -- but I stopped short of formally
including that change in this series because it hasn't proven needed
(but more exotic hypothetical scenarios could easily expose the need
for it). I can append it to the series as an "RFC PATCH 30/29" as
needed.

All review and comments are welcome!

Thanks,
Mike

Mike Snitzer (11):
  nfs/write: fix nfs_initiate_commit to return error from nfs_local_commit
  nfs/localio: discontinue network address based localio setup
  nfs_common: add NFS v3 LOCALIO protocol extension enablement
  nfs: implement v3 client support for NFS_LOCALIO_PROGRAM
  nfsd: implement v3 server support for NFS_LOCALIO_PROGRAM
  nfs_common: add NFS v4 LOCALIO protocol extension enablement
  nfs: implement v4 client support for NFS_LOCALIO_PROGRAM
  nfsd: implement v4 server support for NFS_LOCALIO_PROGRAM
  nfs/nfsd: switch GETUUID to using {encode,decode}_opaque_fixed
  nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h
  nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common

Peng Tao (3):
  sunrpc: add and export rpc_ntop6_addr_noscopeid
  nfs: move nfs_stat_to_errno to nfs.h
  nfs/flexfiles: check local DS when making DS connections

Trond Myklebust (8):
  NFS: Manage boot verifier correctly in the case of localio
  NFS: Enable localio for non-pNFS I/O
  pnfs/flexfiles: Enable localio for flexfiles I/O
  NFS: Add tracepoints for nfs_local_enable and nfs_local_disable
  NFS: Don't call filesystem write() routines directly
  NFS: Don't call filesystem read() routines directly
  NFS: Use completion rather than flush_work() in nfs_local_commit()
  NFS: localio writes need to use a normal workqueue

Weston Andros Adamson (7):
  nfs: pass nfs_client to nfs_initiate_pgio
  nfs: pass nfs_client to nfs_initiate_commit
  nfs: pass descriptor thru nfs_initiate_pgio path
  sunrpc: handle NULL req->defer in cache_defer_req
  sunrpc: export svc_defer
  sunrpc: add rpcauth_map_to_svc_cred
  nfs/nfsd: add "local io" support

 fs/Kconfig                                |   3 +
 fs/nfs/Kconfig                            |  25 +
 fs/nfs/Makefile                           |   2 +
 fs/nfs/blocklayout/blocklayout.c          |   6 +-
 fs/nfs/client.c                           |  15 +-
 fs/nfs/filelayout/filelayout.c            |  19 +-
 fs/nfs/flexfilelayout/flexfilelayout.c    | 129 +++-
 fs/nfs/flexfilelayout/flexfilelayout.h    |   2 +
 fs/nfs/flexfilelayout/flexfilelayoutdev.c |   6 +
 fs/nfs/inode.c                            |  28 +-
 fs/nfs/internal.h                         | 101 ++-
 fs/nfs/localio.c                          | 814 ++++++++++++++++++++++
 fs/nfs/nfs2xdr.c                          |  69 --
 fs/nfs/nfs3_fs.h                          |   1 +
 fs/nfs/nfs3client.c                       |  25 +
 fs/nfs/nfs3proc.c                         |   3 +
 fs/nfs/nfs3xdr.c                          |  58 ++
 fs/nfs/nfs4_fs.h                          |   2 +
 fs/nfs/nfs4client.c                       |  23 +
 fs/nfs/nfs4proc.c                         |   3 +
 fs/nfs/nfs4xdr.c                          |  65 +-
 fs/nfs/nfstrace.h                         |  61 ++
 fs/nfs/pagelist.c                         |  35 +-
 fs/nfs/pnfs.c                             |  24 +-
 fs/nfs/pnfs.h                             |   6 +-
 fs/nfs/pnfs_nfs.c                         |   5 +-
 fs/nfs/write.c                            |  28 +-
 fs/nfs_common/Makefile                    |   3 +
 fs/nfs_common/nfslocalio.c                |  68 ++
 fs/nfsd/Kconfig                           |  25 +
 fs/nfsd/Makefile                          |   2 +
 fs/nfsd/filecache.c                       |   2 +-
 fs/nfsd/localio.c                         | 324 +++++++++
 fs/nfsd/netns.h                           |   4 +
 fs/nfsd/nfsd.h                            |  11 +
 fs/nfsd/nfssvc.c                          |  91 ++-
 fs/nfsd/trace.h                           |   3 +-
 fs/nfsd/vfs.h                             |   8 +
 fs/nfsd/xdr.h                             |   6 +
 include/linux/nfs.h                       |  65 ++
 include/linux/nfs_fs.h                    |   2 +
 include/linux/nfs_fs_sb.h                 |   8 +
 include/linux/nfs_xdr.h                   |  31 +-
 include/linux/nfslocalio.h                |  37 +
 include/linux/sunrpc/auth.h               |   4 +
 include/linux/sunrpc/svc_xprt.h           |   1 +
 include/uapi/linux/nfs.h                  |   4 +
 net/sunrpc/auth.c                         |  16 +
 net/sunrpc/cache.c                        |   2 +
 net/sunrpc/svc_xprt.c                     |   4 +-
 50 files changed, 2120 insertions(+), 159 deletions(-)
 create mode 100644 fs/nfs/localio.c
 create mode 100644 fs/nfs_common/nfslocalio.c
 create mode 100644 fs/nfsd/localio.c
 create mode 100644 include/linux/nfslocalio.h

Comments

Mike Snitzer June 7, 2024, 6:09 p.m. UTC | #1
On Fri, Jun 07, 2024 at 10:26:17AM -0400, Mike Snitzer wrote:
> 
> My container testing was done in terms of podman managed containers.
> I'd appreciate additional review relative to network namespaces.
> fs/nfsd/localio.c:nfsd_local_fakerqst_create() in particular is simply
> using the client's network namespace with rpc_net_ns(rpc_clnt). I have
> an extra patch that updates nfsd_open_local_fh()'s first argument to
> be the server's 'struct net' -- but I stopped short of formally
> including that change in this series because it hasn't proven needed
> (but more exotic hypothetical scenarios could easily expose the need
> for it). I can append it to the series as an "RFC PATCH 30/29" as
> needed.

I did just post that 30/29 patch to this thread.

And here is my git tree for these changes:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-6.11
Jeff Layton June 10, 2024, 12:47 p.m. UTC | #2
On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> Hi,
> 
> This patch series rebases "localio" changes that Hammerspace (and
> Primary Data before it) has been carrying since 2014. The reason they
> weren't proposed for upstream inclusion until now was the handshake
> for whether or not a client and server are local was brittle. Please
> see the commit header of "nfs/localio: discontinue network address
> based localio setup" (patch 20) for more context.
> 
> Aside from rebasing the original changes (patches 1 - 18) from a
> 5.15.-130-stable kernel, my contribution to this series was to make
> the localio handshake more robust. To do so a new LOCALIO protocol
> extension has been added to both NFS v3 and v4. It follows the
> well-worn pattern established by the ACL protocol extension.
> 
> These changes have proven stable against various test scenarios:
> 1) client and server both on localhost (for both v3 and v4.2)
> 2) various permutations of client and server support enablement for
>    both local and remote client and server.
> 3) client on host, server within a container (for both v3 and v4.2)
> 
> I've preserved all established author and Signed-off-by attribution
> despite Andy, Peng and Jeff no longer working for Primary Data (or
> Hammerspace). I've confirmed with Trond that its best to keep it all
> despite those email addresses no longer being active. My Signed-off-
> by
> and that of reviewers and maintainer(s) to follow will build on the
> established development provenance.
>
> I also made sure to preserve the original work done by others (rather
> than fold changes that I add to this work, to avoid tainting the long
> established development and sequence of changes).
> 

Honestly, I don't give a fig about the historical changes here. I'd
_much_ rather see a more logical folded patchset that avoids a lot of
the "churn". Given the long timescale of this series, the history is
just not terribly useful.

For instance, you're adding in the old network address tracking in the
earlier patches and then remove that in patch #20, which just means I
have to review a bunch of stuff that is ultimately going away. I'll
still review the set you've posted, but I think folding down the
changes would be best.

> My container testing was done in terms of podman managed containers.
> I'd appreciate additional review relative to network namespaces.
> fs/nfsd/localio.c:nfsd_local_fakerqst_create() in particular is
> simply
> using the client's network namespace with rpc_net_ns(rpc_clnt). I
> have
> an extra patch that updates nfsd_open_local_fh()'s first argument to
> be the server's 'struct net' -- but I stopped short of formally
> including that change in this series because it hasn't proven needed
> (but more exotic hypothetical scenarios could easily expose the need
> for it). I can append it to the series as an "RFC PATCH 30/29" as
> needed.
> 
> All review and comments are welcome!
> 
> Thanks,
> Mike
> 
> Mike Snitzer (11):
>   nfs/write: fix nfs_initiate_commit to return error from
> nfs_local_commit
>   nfs/localio: discontinue network address based localio setup
>   nfs_common: add NFS v3 LOCALIO protocol extension enablement
>   nfs: implement v3 client support for NFS_LOCALIO_PROGRAM
>   nfsd: implement v3 server support for NFS_LOCALIO_PROGRAM
>   nfs_common: add NFS v4 LOCALIO protocol extension enablement
>   nfs: implement v4 client support for NFS_LOCALIO_PROGRAM
>   nfsd: implement v4 server support for NFS_LOCALIO_PROGRAM
>   nfs/nfsd: switch GETUUID to using {encode,decode}_opaque_fixed
>   nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h
>   nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common
> 
> Peng Tao (3):
>   sunrpc: add and export rpc_ntop6_addr_noscopeid
>   nfs: move nfs_stat_to_errno to nfs.h
>   nfs/flexfiles: check local DS when making DS connections
> 
> Trond Myklebust (8):
>   NFS: Manage boot verifier correctly in the case of localio
>   NFS: Enable localio for non-pNFS I/O
>   pnfs/flexfiles: Enable localio for flexfiles I/O
>   NFS: Add tracepoints for nfs_local_enable and nfs_local_disable
>   NFS: Don't call filesystem write() routines directly
>   NFS: Don't call filesystem read() routines directly
>   NFS: Use completion rather than flush_work() in nfs_local_commit()
>   NFS: localio writes need to use a normal workqueue
> 
> Weston Andros Adamson (7):
>   nfs: pass nfs_client to nfs_initiate_pgio
>   nfs: pass nfs_client to nfs_initiate_commit
>   nfs: pass descriptor thru nfs_initiate_pgio path
>   sunrpc: handle NULL req->defer in cache_defer_req
>   sunrpc: export svc_defer
>   sunrpc: add rpcauth_map_to_svc_cred
>   nfs/nfsd: add "local io" support
> 
>  fs/Kconfig                                |   3 +
>  fs/nfs/Kconfig                            |  25 +
>  fs/nfs/Makefile                           |   2 +
>  fs/nfs/blocklayout/blocklayout.c          |   6 +-
>  fs/nfs/client.c                           |  15 +-
>  fs/nfs/filelayout/filelayout.c            |  19 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c    | 129 +++-
>  fs/nfs/flexfilelayout/flexfilelayout.h    |   2 +
>  fs/nfs/flexfilelayout/flexfilelayoutdev.c |   6 +
>  fs/nfs/inode.c                            |  28 +-
>  fs/nfs/internal.h                         | 101 ++-
>  fs/nfs/localio.c                          | 814
> ++++++++++++++++++++++
>  fs/nfs/nfs2xdr.c                          |  69 --
>  fs/nfs/nfs3_fs.h                          |   1 +
>  fs/nfs/nfs3client.c                       |  25 +
>  fs/nfs/nfs3proc.c                         |   3 +
>  fs/nfs/nfs3xdr.c                          |  58 ++
>  fs/nfs/nfs4_fs.h                          |   2 +
>  fs/nfs/nfs4client.c                       |  23 +
>  fs/nfs/nfs4proc.c                         |   3 +
>  fs/nfs/nfs4xdr.c                          |  65 +-
>  fs/nfs/nfstrace.h                         |  61 ++
>  fs/nfs/pagelist.c                         |  35 +-
>  fs/nfs/pnfs.c                             |  24 +-
>  fs/nfs/pnfs.h                             |   6 +-
>  fs/nfs/pnfs_nfs.c                         |   5 +-
>  fs/nfs/write.c                            |  28 +-
>  fs/nfs_common/Makefile                    |   3 +
>  fs/nfs_common/nfslocalio.c                |  68 ++
>  fs/nfsd/Kconfig                           |  25 +
>  fs/nfsd/Makefile                          |   2 +
>  fs/nfsd/filecache.c                       |   2 +-
>  fs/nfsd/localio.c                         | 324 +++++++++
>  fs/nfsd/netns.h                           |   4 +
>  fs/nfsd/nfsd.h                            |  11 +
>  fs/nfsd/nfssvc.c                          |  91 ++-
>  fs/nfsd/trace.h                           |   3 +-
>  fs/nfsd/vfs.h                             |   8 +
>  fs/nfsd/xdr.h                             |   6 +
>  include/linux/nfs.h                       |  65 ++
>  include/linux/nfs_fs.h                    |   2 +
>  include/linux/nfs_fs_sb.h                 |   8 +
>  include/linux/nfs_xdr.h                   |  31 +-
>  include/linux/nfslocalio.h                |  37 +
>  include/linux/sunrpc/auth.h               |   4 +
>  include/linux/sunrpc/svc_xprt.h           |   1 +
>  include/uapi/linux/nfs.h                  |   4 +
>  net/sunrpc/auth.c                         |  16 +
>  net/sunrpc/cache.c                        |   2 +
>  net/sunrpc/svc_xprt.c                     |   4 +-
>  50 files changed, 2120 insertions(+), 159 deletions(-)
>  create mode 100644 fs/nfs/localio.c
>  create mode 100644 fs/nfs_common/nfslocalio.c
>  create mode 100644 fs/nfsd/localio.c
>  create mode 100644 include/linux/nfslocalio.h
>
Mike Snitzer June 10, 2024, 4:47 p.m. UTC | #3
On Mon, Jun 10, 2024 at 08:47:47AM -0400, Jeff Layton wrote:
> On Fri, 2024-06-07 at 10:26 -0400, Mike Snitzer wrote:
> > Hi,
> > 
> > This patch series rebases "localio" changes that Hammerspace (and
> > Primary Data before it) has been carrying since 2014. The reason they
> > weren't proposed for upstream inclusion until now was the handshake
> > for whether or not a client and server are local was brittle. Please
> > see the commit header of "nfs/localio: discontinue network address
> > based localio setup" (patch 20) for more context.
> > 
> > Aside from rebasing the original changes (patches 1 - 18) from a
> > 5.15.-130-stable kernel, my contribution to this series was to make
> > the localio handshake more robust. To do so a new LOCALIO protocol
> > extension has been added to both NFS v3 and v4. It follows the
> > well-worn pattern established by the ACL protocol extension.
> > 
> > These changes have proven stable against various test scenarios:
> > 1) client and server both on localhost (for both v3 and v4.2)
> > 2) various permutations of client and server support enablement for
> >    both local and remote client and server.
> > 3) client on host, server within a container (for both v3 and v4.2)
> > 
> > I've preserved all established author and Signed-off-by attribution
> > despite Andy, Peng and Jeff no longer working for Primary Data (or
> > Hammerspace). I've confirmed with Trond that its best to keep it all
> > despite those email addresses no longer being active. My Signed-off-
> > by
> > and that of reviewers and maintainer(s) to follow will build on the
> > established development provenance.
> >
> > I also made sure to preserve the original work done by others (rather
> > than fold changes that I add to this work, to avoid tainting the long
> > established development and sequence of changes).
> > 
> 
> Honestly, I don't give a fig about the historical changes here. I'd
> _much_ rather see a more logical folded patchset that avoids a lot of
> the "churn". Given the long timescale of this series, the history is
> just not terribly useful.

Fair, will do (and this answers the question I just asked in response
to a different patch).
 
> For instance, you're adding in the old network address tracking in the
> earlier patches and then remove that in patch #20, which just means I
> have to review a bunch of stuff that is ultimately going away. I'll
> still review the set you've posted, but I think folding down the
> changes would be best.

Yeah, I just wanted to not be excessive with folding patches -- purely
to preserve the evolution of these changes (given the different
authors, etc).  But I agree with you, and will sort it out for v2.

Mike