From patchwork Fri Aug 23 18:14:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Snitzer X-Patchwork-Id: 13775677 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35D6D1922C6; Fri, 23 Aug 2024 18:14:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724436879; cv=none; b=JieZUxzapD3Jji4LsNSi7bGQuwJFkyZJKoVYZZU56HHObmi7nJ0zLK+UVvXRbYCqNqy1pnmXLJP5VrPTSTsDeNMhNeY2QrVRg+8oHI12xiAHz5EIo5U9ydbW4gAhUUZ7+dtI1wWeVKNqB20t0LMoeiHiULLQy0dV7adzurmCx6o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724436879; c=relaxed/simple; bh=Geqqzr9Enbm0U8bO93diaWFXmL5jAByZUp9I/CGe1zg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e4m3T0nck158SJPMmi9RKhPxQttCScUsks0HF5F7llY25+EwSUb1AsUaymeGhIgdSeihFuzPp8jYvyOIE83kT/UlXKbWRzPx7dDXAOWBPvAWkp5cKOyXR39jwj2Dkr5mSWJ5zYCiVfnskAlrSNYoUHit45O1kR+fZyM1NQ8kuSc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=uK8zAEBW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uK8zAEBW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 82658C4AF09; Fri, 23 Aug 2024 18:14:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724436878; bh=Geqqzr9Enbm0U8bO93diaWFXmL5jAByZUp9I/CGe1zg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=uK8zAEBW+Vqj9+9mZCXo87Qc9JO6CI6Fab666qh6d3zApYr9xQ4G73qZKZJWNUu67 AoxTOUCtFo01g/tMMMbbp46YT1KG1CGjHuBKXWOzLtRtfbt2TbJTX/MZ0lqAJX1mju TjGlPKRhT1hQMG8p8ew5ORjtWv0JQah/aUYNiRdVfOHwrG6wHD7jks7MbIFiWCiJD9 gvyQlA0Z25PuyVXg981vu9l8aSyB7/Jkllgo4U2Hw2nuop9TJCTtwlHH4HTf+djUKd UcRLjcrRlgI3pdeWRPuQNWm+CVM08PKVGy2nUvZaX4oJonhHEWBnMSoDs52AzL0f5Q QFN7ekaBEvLuA== From: Mike Snitzer To: linux-nfs@vger.kernel.org Cc: Jeff Layton , Chuck Lever , Anna Schumaker , Trond Myklebust , NeilBrown , linux-fsdevel@vger.kernel.org Subject: [PATCH v13 10/19] nfsd: add localio support Date: Fri, 23 Aug 2024 14:14:08 -0400 Message-ID: <20240823181423.20458-11-snitzer@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240823181423.20458-1-snitzer@kernel.org> References: <20240823181423.20458-1-snitzer@kernel.org> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Weston Andros Adamson Add server support for bypassing NFS for localhost reads, writes, and commits. This is only useful when both the client and server are running on the same host. If nfsd_open_local_fh() fails then the NFS client will both retry and fallback to normal network-based read, write and commit operations if localio is no longer supported. Care is taken to ensure the same NFS security mechanisms are used (authentication, etc) regardless of whether localio or regular NFS access is used. The auth_domain established as part of the traditional NFS client access to the NFS server is also used for localio. Store auth_domain for localio in nfsd_uuid_t and transfer it to the client if it is local to the server. Relative to containers, localio gives the client access to the network namespace the server has. This is required to allow the client to access the server's per-namespace nfsd_net struct. CONFIG_NFSD_LOCALIO controls the server enablement for localio. A later commit will add CONFIG_NFS_LOCALIO to allow the client enablement. This commit also introduces the use of a percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh, and warrants a more detailed explanation: Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any client initiated localio calls to nfsd (that are _not_ in the context of nfsd) are complete. nfsd_open_local_fh is updated to nfsd_serv_try_get before opening its file handle and then drop the reference using nfsd_serv_put at the end of nfsd_open_local_fh. This "interlock" working relies heavily on nfsd_open_local_fh()'s maybe_get_net() safely dealing with the possibility that the struct net (and nfsd_net by association) may have been destroyed by nfsd_destroy_serv() via nfsd_shutdown_net(). Verified to fix an easy to hit crash that would occur if an nfsd instance running in a container, with a localio client mounted, is shutdown. Upon restart of the container and associated nfsd the client would go on to crash due to NULL pointer dereference that occuured due to the nfs client's localio attempting to nfsd_open_local_fh(), using nn->nfsd_serv, without having a proper reference on nn->nfsd_serv. Signed-off-by: Weston Andros Adamson Signed-off-by: Trond Myklebust Co-developed-by: Mike Snitzer Signed-off-by: Mike Snitzer --- fs/Kconfig | 3 ++ fs/nfsd/Kconfig | 14 ++++++ fs/nfsd/Makefile | 1 + fs/nfsd/filecache.c | 2 +- fs/nfsd/localio.c | 108 ++++++++++++++++++++++++++++++++++++++++++++ fs/nfsd/netns.h | 8 +++- fs/nfsd/nfssvc.c | 39 ++++++++++++++++ fs/nfsd/trace.h | 3 +- fs/nfsd/vfs.h | 10 ++++ 9 files changed, 185 insertions(+), 3 deletions(-) create mode 100644 fs/nfsd/localio.c diff --git a/fs/Kconfig b/fs/Kconfig index a46b0cbc4d8f..1b8a5edbddff 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT tristate select FS_POSIX_ACL +config NFS_COMMON_LOCALIO_SUPPORT + bool + config NFS_COMMON bool depends on NFSD || NFS_FS || LOCKD diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index c0bd1509ccd4..1fca57c79c60 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -90,6 +90,20 @@ config NFSD_V4 If unsure, say N. +config NFSD_LOCALIO + bool "NFS server support for the LOCALIO auxiliary protocol" + depends on NFSD + select NFS_COMMON_LOCALIO_SUPPORT + help + Some NFS servers support an auxiliary NFS LOCALIO protocol + that is not an official part of the NFS protocol. + + This option enables support for the LOCALIO protocol in the + kernel's NFS server. Enable this to bypass using the NFS + protocol when issuing reads, writes and commits to the server. + + If unsure, say N. + config NFSD_PNFS bool diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile index b8736a82e57c..78b421778a79 100644 --- a/fs/nfsd/Makefile +++ b/fs/nfsd/Makefile @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 2a79c45ca27a..d7c6122231f4 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -53,7 +53,7 @@ #define NFSD_FILE_CACHE_UP (0) /* We only care about NFSD_MAY_READ/WRITE for this cache */ -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO) static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c new file mode 100644 index 000000000000..8e0cce835f81 --- /dev/null +++ b/fs/nfsd/localio.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * NFS server support for local clients to bypass network stack + * + * Copyright (C) 2014 Weston Andros Adamson + * Copyright (C) 2019 Trond Myklebust + * Copyright (C) 2024 Mike Snitzer + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "nfsd.h" +#include "vfs.h" +#include "netns.h" +#include "filecache.h" + +/** + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file + * + * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net + * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred + * @cred: cred that the client established + * @nfs_fh: filehandle to lookup + * @fmode: fmode_t to use for open + * @pnf: returned nfsd_file pointer that maps to @nfs_fh + * + * This function maps a local fh to a path on a local filesystem. + * This is useful when the nfs client has the local server mounted - it can + * avoid all the NFS overhead with reads, writes and commits. + * + * On successful return, caller is responsible for calling nfsd_file_put. Also + * note that this is called from nfs.ko via find_symbol() to avoid an explicit + * dependency on knfsd. So, there is no forward declaration in a header file + * for it that is shared with the client. + */ +int nfsd_open_local_fh(struct net *cl_nfssvc_net, + struct auth_domain *cl_nfssvc_dom, + struct rpc_clnt *rpc_clnt, + const struct cred *cred, + const struct nfs_fh *nfs_fh, + const fmode_t fmode, + struct nfsd_file **pnf) +{ + int mayflags = NFSD_MAY_LOCALIO; + int status = 0; + struct nfsd_net *nn; + const struct cred *save_cred; + struct svc_cred rq_cred; + struct svc_fh fh; + __be32 beres; + + if (nfs_fh->size > NFS4_FHSIZE) + return -EINVAL; + + /* Not running in nfsd context, must safely get reference on nfsd_serv */ + cl_nfssvc_net = maybe_get_net(cl_nfssvc_net); + if (!cl_nfssvc_net) + return -ENXIO; + nn = net_generic(cl_nfssvc_net, nfsd_net_id); + + /* The server may already be shutting down, disallow new localio */ + if (unlikely(!nfsd_serv_try_get(nn))) { + status = -ENXIO; + goto out_net; + } + + /* Save client creds before calling nfsd_file_acquire_local which calls nfsd_setuser */ + save_cred = get_current_cred(); + + /* nfs_fh -> svc_fh */ + fh_init(&fh, NFS4_FHSIZE); + fh.fh_handle.fh_size = nfs_fh->size; + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size); + + if (fmode & FMODE_READ) + mayflags |= NFSD_MAY_READ; + if (fmode & FMODE_WRITE) + mayflags |= NFSD_MAY_WRITE; + + rpcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred); + + beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, rpc_clnt->cl_vers, + cl_nfssvc_dom, &fh, mayflags, pnf); + if (beres) { + status = nfs_stat_to_errno(be32_to_cpu(beres)); + goto out_fh_put; + } +out_fh_put: + fh_put(&fh); + if (rq_cred.cr_group_info) + put_group_info(rq_cred.cr_group_info); + revert_creds(save_cred); + nfsd_serv_put(nn); +out_net: + put_net(cl_nfssvc_net); + return status; +} +EXPORT_SYMBOL_GPL(nfsd_open_local_fh); + +/* Compile time type checking, not used by anything */ +static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh; diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 238fc4e56e53..e2d953f21dde 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -139,7 +140,9 @@ struct nfsd_net { struct svc_info nfsd_info; #define nfsd_serv nfsd_info.serv - + struct percpu_ref nfsd_serv_ref; + struct completion nfsd_serv_confirm_done; + struct completion nfsd_serv_free_done; /* * clientid and stateid data for construction of net unique COPY @@ -221,6 +224,9 @@ struct nfsd_net { extern bool nfsd_support_version(int vers); extern unsigned int nfsd_net_id; +bool nfsd_serv_try_get(struct nfsd_net *nn); +void nfsd_serv_put(struct nfsd_net *nn); + void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn); void nfsd_reset_write_verifier(struct nfsd_net *nn); #endif /* __NFSD_NETNS_H__ */ diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index b02eaaea7d62..c639fbe4d8c2 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -192,6 +192,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change return 0; } +bool nfsd_serv_try_get(struct nfsd_net *nn) +{ + return percpu_ref_tryget_live(&nn->nfsd_serv_ref); +} + +void nfsd_serv_put(struct nfsd_net *nn) +{ + percpu_ref_put(&nn->nfsd_serv_ref); +} + +static void nfsd_serv_done(struct percpu_ref *ref) +{ + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); + + complete(&nn->nfsd_serv_confirm_done); +} + +static void nfsd_serv_free(struct percpu_ref *ref) +{ + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); + + complete(&nn->nfsd_serv_free_done); +} + /* * Maximum number of nfsd processes */ @@ -391,6 +415,7 @@ static void nfsd_shutdown_net(struct net *net) lockd_down(net); nn->lockd_up = false; } + percpu_ref_exit(&nn->nfsd_serv_ref); nn->nfsd_net_up = false; nfsd_shutdown_generic(); } @@ -470,6 +495,13 @@ void nfsd_destroy_serv(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct svc_serv *serv = nn->nfsd_serv; + lockdep_assert_held(&nfsd_mutex); + + percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done); + wait_for_completion(&nn->nfsd_serv_confirm_done); + wait_for_completion(&nn->nfsd_serv_free_done); + /* percpu_ref_exit is called in nfsd_shutdown_net */ + spin_lock(&nfsd_notifier_lock); nn->nfsd_serv = NULL; spin_unlock(&nfsd_notifier_lock); @@ -594,6 +626,13 @@ int nfsd_create_serv(struct net *net) if (nn->nfsd_serv) return 0; + error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free, + 0, GFP_KERNEL); + if (error) + return error; + init_completion(&nn->nfsd_serv_free_done); + init_completion(&nn->nfsd_serv_confirm_done); + if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 77bbd23aa150..9c0610fdd11c 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode); { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \ { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \ { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \ - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }) + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \ + { NFSD_MAY_LOCALIO, "LOCALIO" }) TRACE_EVENT(nfsd_compound, TP_PROTO( diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 01947561d375..ec8a8aae540b 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -33,6 +33,8 @@ #define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */ +#define NFSD_MAY_LOCALIO 0x2000 /* for tracing, reflects when localio used */ + #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE) #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC) @@ -158,6 +160,14 @@ __be32 nfsd_permission(struct svc_cred *cred, struct svc_export *exp, void nfsd_filp_close(struct file *fp); +int nfsd_open_local_fh(struct net *net, + struct auth_domain *dom, + struct rpc_clnt *rpc_clnt, + const struct cred *cred, + const struct nfs_fh *nfs_fh, + const fmode_t fmode, + struct nfsd_file **pnf); + static inline int fh_want_write(struct svc_fh *fh) { int ret;