mbox series

[v8,0/3] add rpc_status netlink support for NFSD

Message ID cover.1694436263.git.lorenzo@kernel.org (mailing list archive)
Headers show
Series add rpc_status netlink support for NFSD | expand

Message

Lorenzo Bianconi Sept. 11, 2023, 12:49 p.m. UTC
Introduce rpc_status netlink support for NFSD in order to dump pending
RPC requests debugging information from userspace.
The new code can be tested with the user-space tool reported below:
https://github.com/LorenzoBianconi/nfsd-rpc-netlink-monitor/tree/main

Changes since v7:
- introduce nfsd_server.yaml for netlink messages

Changes since v6:
- report info to user-space through netlink and get rid of the related
  entry in the procfs

Changes since v5:
- add missing delimiters for nfs4 compound ops
- add a header line for rpc_status handler
- do not dump rq_prog
- do not dump rq_flags in hex

Changes since v4:
- rely on acquire/release APIs and get rid of atomic operation
- fix kdoc for nfsd_rpc_status_open
- get rid of ',' as field delimiter in nfsd_rpc_status hanlder
- move nfsd_rpc_status before nfsd_v4 enum entries
- fix compilantion error if nfsdv4 is not enabled

Changes since v3:
- introduce rq_status_counter in order to detect if the RPC request is
  pending and RPC info are stable
- rely on __svc_print_addr to dump IP info

Changes since v2:
- minor changes in nfsd_rpc_status_show output

Changes since v1:
- rework nfsd_rpc_status_show output

Changes since RFCv1:
- riduce time holding nfsd_mutex bumping svc_serv refcoung in
  nfsd_rpc_status_open()
- dump rqstp->rq_stime
- add missing kdoc for nfsd_rpc_status_open()

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D3D3D366

Lorenzo Bianconi (3):
  Documentation: netlink: add a YAML spec for nfsd_server
  NFSD: introduce netlink rpc_status stubs
  NFSD: add rpc_status netlink support

 Documentation/netlink/specs/nfsd_server.yaml |  97 +++++++++
 fs/nfsd/Makefile                             |   3 +-
 fs/nfsd/nfs_netlink_gen.c                    |  32 +++
 fs/nfsd/nfs_netlink_gen.h                    |  22 ++
 fs/nfsd/nfsctl.c                             | 204 +++++++++++++++++++
 fs/nfsd/nfsd.h                               |  16 ++
 fs/nfsd/nfssvc.c                             |  15 ++
 fs/nfsd/state.h                              |   2 -
 include/linux/sunrpc/svc.h                   |   1 +
 include/uapi/linux/nfsd_server.h             |  49 +++++
 10 files changed, 438 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/netlink/specs/nfsd_server.yaml
 create mode 100644 fs/nfsd/nfs_netlink_gen.c
 create mode 100644 fs/nfsd/nfs_netlink_gen.h
 create mode 100644 include/uapi/linux/nfsd_server.h

Comments

Chuck Lever Sept. 11, 2023, 6:10 p.m. UTC | #1
On Mon, Sep 11, 2023 at 02:49:44PM +0200, Lorenzo Bianconi wrote:
> Introduce nfsd_server.yaml specs to generate uAPI and netlink
> code for nfsd server.
> Add rpc-status specs to define message reported by the nfsd server
> dumping the pending RPC requests.
> 
> Tested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  Documentation/netlink/specs/nfsd_server.yaml | 97 ++++++++++++++++++++
>  1 file changed, 97 insertions(+)
>  create mode 100644 Documentation/netlink/specs/nfsd_server.yaml

I've had a look... the series is simple and short. Thanks!

My only quibbles right now are cosmetic and naming-related, all
of which can be addressed when I apply these. So I'm going to
wait for other review comments to see if we need another version
or whether I can apply v8 with by-hand clean-ups.

Comments below are what I might change when applying this one.
This is not (yet) a request for a new version.


> diff --git a/Documentation/netlink/specs/nfsd_server.yaml b/Documentation/netlink/specs/nfsd_server.yaml
> new file mode 100644
> index 000000000000..e681b493847b
> --- /dev/null
> +++ b/Documentation/netlink/specs/nfsd_server.yaml
> @@ -0,0 +1,97 @@
> +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
> +
> +name: nfsd_server

IMHO "nfsd_server" is redundant. "nfsd" should work.


> +
> +doc:
> +  nfsd server configuration over generic netlink.
> +
> +attribute-sets:
> +  -
> +    name: rpc-status-comp-op-attr
> +    enum-name: nfsd-rpc-status-comp-attr
> +    name-prefix: nfsd-attr-rpc-status-comp-
> +    attributes:
> +      -
> +        name: unspec
> +        type: unused
> +        value: 0

I don't recall whether a zero-value definition is explicitly
necessary. Maybe "value-start: 1" would work rather than
these three lines? Why is zero a special attribute value?


> +      -
> +        name: op
> +        type: u32
> +  -
> +    name: rpc-status-attr
> +    enum-name: nfsd-rpc-status-attr
> +    name-prefix: nfsd-attr-rpc-status-

Specifying all three of these name settings seems a bit
cluttered.


> +    attributes:
> +      -
> +        name: unspec
> +        type: unused
> +        value: 0
> +      -
> +        name: xid
> +        type: u32
> +        byte-order: big-endian
> +      -
> +        name: flags
> +        type: u32
> +      -
> +        name: prog
> +        type: u32
> +      -
> +        name: version
> +        type: u8
> +      -
> +        name: proc
> +        type: u32
> +      -
> +        name: service_time
> +        type: s64
> +      -
> +        name: pad
> +        type: pad
> +      -
> +        name: saddr4
> +        type: u32
> +        byte-order: big-endian
> +        display-hint: ipv4
> +      -
> +        name: daddr4
> +        type: u32
> +        byte-order: big-endian
> +        display-hint: ipv4
> +      -
> +        name: saddr6
> +        type: binary
> +        display-hint: ipv6
> +      -
> +        name: daddr6
> +        type: binary
> +        display-hint: ipv6
> +      -
> +        name: sport
> +        type: u16
> +        byte-order: big-endian
> +      -
> +        name: dport
> +        type: u16
> +        byte-order: big-endian
> +      -
> +        name: compond-op

s/compond-op/compound-op

> +        type: array-nest
> +        nested-attributes: rpc-status-comp-op-attr

So, this is supposed to be a counted array of op numbers? Is there
an existing type that could be used for this instead?


> +
> +operations:
> +  enum-name: nfsd-commands
> +  name-prefix: nfsd-cmd-
> +  list:
> +    -
> +      name: unspec
> +      doc: unused
> +      value: 0
> +    -
> +      name: rpc-status-get
> +      doc: dump pending nfsd rpc
> +      attribute-set: rpc-status-attr
> +      dump:
> +        pre: nfsd-server-nl-rpc-status-get-start
> +        post: nfsd-server-nl-rpc-status-get-done
Jeff Layton Sept. 11, 2023, 7:35 p.m. UTC | #2
On Mon, 2023-09-11 at 14:49 +0200, Lorenzo Bianconi wrote:
> Generate empty netlink stubs and uAPI through nfsd_server.yaml specs:
> 
> $./tools/net/ynl/ynl-gen-c.py --mode uapi \
>  --spec Documentation/netlink/specs/nfsd_server.yaml \
>  --header -o include/uapi/linux/nfsd_server.h
> $./tools/net/ynl/ynl-gen-c.py --mode kernel \
>  --spec Documentation/netlink/specs/nfsd_server.yaml \
>  --header -o fs/nfsd/nfs_netlink_gen.h
> $./tools/net/ynl/ynl-gen-c.py --mode kernel \
>  --spec Documentation/netlink/specs/nfsd_server.yaml \
>  --source -o fs/nfsd/nfs_netlink_gen.c
> 
> Tested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  fs/nfsd/Makefile                 |  3 +-
>  fs/nfsd/nfs_netlink_gen.c        | 32 +++++++++++++++++++++
>  fs/nfsd/nfs_netlink_gen.h        | 22 ++++++++++++++
>  fs/nfsd/nfsctl.c                 | 16 +++++++++++
>  include/uapi/linux/nfsd_server.h | 49 ++++++++++++++++++++++++++++++++
>  5 files changed, 121 insertions(+), 1 deletion(-)
>  create mode 100644 fs/nfsd/nfs_netlink_gen.c
>  create mode 100644 fs/nfsd/nfs_netlink_gen.h
>  create mode 100644 include/uapi/linux/nfsd_server.h
> 
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index 6fffc8f03f74..6ae1d5450bf6 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -12,7 +12,8 @@ nfsd-y			+= trace.o
>  
>  nfsd-y 			+= nfssvc.o nfsctl.o nfsfh.o vfs.o \
>  			   export.o auth.o lockd.o nfscache.o \
> -			   stats.o filecache.o nfs3proc.o nfs3xdr.o
> +			   stats.o filecache.o nfs3proc.o nfs3xdr.o \
> +			   nfs_netlink_gen.o
>  nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o
>  nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o
>  nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o
> diff --git a/fs/nfsd/nfs_netlink_gen.c b/fs/nfsd/nfs_netlink_gen.c
> new file mode 100644
> index 000000000000..4d71b80bf4a7
> --- /dev/null
> +++ b/fs/nfsd/nfs_netlink_gen.c
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
> +/* Do not edit directly, auto-generated from: */
> +/*	Documentation/netlink/specs/nfsd_server.yaml */
> +/* YNL-GEN kernel source */
> +
> +#include <net/netlink.h>
> +#include <net/genetlink.h>
> +
> +#include "nfs_netlink_gen.h"
> +
> +#include <uapi/linux/nfsd_server.h>
> +
> +/* Ops table for nfsd_server */
> +static const struct genl_split_ops nfsd_server_nl_ops[] = {
> +	{
> +		.cmd	= NFSD_CMD_RPC_STATUS_GET,
> +		.start	= nfsd_server_nl_rpc_status_get_start,
> +		.dumpit	= nfsd_server_nl_rpc_status_get_dumpit,
> +		.done	= nfsd_server_nl_rpc_status_get_done,
> +		.flags	= GENL_CMD_CAP_DUMP,
> +	},
> +};
> +
> +struct genl_family nfsd_server_nl_family __ro_after_init = {
> +	.name		= NFSD_SERVER_FAMILY_NAME,
> +	.version	= NFSD_SERVER_FAMILY_VERSION,
> +	.netnsok	= true,
> +	.parallel_ops	= true,
> +	.module		= THIS_MODULE,
> +	.split_ops	= nfsd_server_nl_ops,
> +	.n_split_ops	= ARRAY_SIZE(nfsd_server_nl_ops),
> +};
> diff --git a/fs/nfsd/nfs_netlink_gen.h b/fs/nfsd/nfs_netlink_gen.h
> new file mode 100644
> index 000000000000..f66b29e528c1
> --- /dev/null
> +++ b/fs/nfsd/nfs_netlink_gen.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
> +/* Do not edit directly, auto-generated from: */
> +/*	Documentation/netlink/specs/nfsd_server.yaml */
> +/* YNL-GEN kernel header */
> +
> +#ifndef _LINUX_NFSD_SERVER_GEN_H
> +#define _LINUX_NFSD_SERVER_GEN_H
> +
> +#include <net/netlink.h>
> +#include <net/genetlink.h>
> +
> +#include <uapi/linux/nfsd_server.h>
> +
> +int nfsd_server_nl_rpc_status_get_start(struct netlink_callback *cb);
> +int nfsd_server_nl_rpc_status_get_done(struct netlink_callback *cb);
> +
> +int nfsd_server_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> +					 struct netlink_callback *cb);
> +
> +extern struct genl_family nfsd_server_nl_family;
> +
> +#endif /* _LINUX_NFSD_SERVER_GEN_H */
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 33f80d289d63..1be66088849c 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1495,6 +1495,22 @@ static int create_proc_exports_entry(void)
>  
>  unsigned int nfsd_net_id;
>  
> +int nfsd_server_nl_rpc_status_get_start(struct netlink_callback *cb)
> +{
> +	return 0;
> +}
> +
> +int nfsd_server_nl_rpc_status_get_done(struct netlink_callback *cb)
> +{
> +	return 0;
> +}
> +
> +int nfsd_server_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> +					 struct netlink_callback *cb)
> +{
> +	return 0;
> +}
> +
>  /**
>   * nfsd_net_init - Prepare the nfsd_net portion of a new net namespace
>   * @net: a freshly-created network namespace
> diff --git a/include/uapi/linux/nfsd_server.h b/include/uapi/linux/nfsd_server.h
> new file mode 100644
> index 000000000000..c9ee00ceca3b
> --- /dev/null
> +++ b/include/uapi/linux/nfsd_server.h
> @@ -0,0 +1,49 @@
> +/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
> +/* Do not edit directly, auto-generated from: */
> +/*	Documentation/netlink/specs/nfsd_server.yaml */
> +/* YNL-GEN uapi header */
> +
> +#ifndef _UAPI_LINUX_NFSD_SERVER_H
> +#define _UAPI_LINUX_NFSD_SERVER_H
> +
> +#define NFSD_SERVER_FAMILY_NAME		"nfsd_server"
> +#define NFSD_SERVER_FAMILY_VERSION	1
> +
> +enum nfsd_rpc_status_comp_attr {
> +	NFSD_ATTR_RPC_STATUS_COMP_UNSPEC,
> +	NFSD_ATTR_RPC_STATUS_COMP_OP,
> +
> +	__NFSD_ATTR_RPC_STATUS_COMP_MAX,
> +	NFSD_ATTR_RPC_STATUS_COMP_MAX = (__NFSD_ATTR_RPC_STATUS_COMP_MAX - 1)
> +};
> +
> +enum nfsd_rpc_status_attr {
> +	NFSD_ATTR_RPC_STATUS_UNSPEC,
> +	NFSD_ATTR_RPC_STATUS_XID,
> +	NFSD_ATTR_RPC_STATUS_FLAGS,
> +	NFSD_ATTR_RPC_STATUS_PROG,
> +	NFSD_ATTR_RPC_STATUS_VERSION,
> +	NFSD_ATTR_RPC_STATUS_PROC,
> +	NFSD_ATTR_RPC_STATUS_SERVICE_TIME,
> +	NFSD_ATTR_RPC_STATUS_PAD,
> +	NFSD_ATTR_RPC_STATUS_SADDR4,
> +	NFSD_ATTR_RPC_STATUS_DADDR4,
> +	NFSD_ATTR_RPC_STATUS_SADDR6,
> +	NFSD_ATTR_RPC_STATUS_DADDR6,
> +	NFSD_ATTR_RPC_STATUS_SPORT,
> +	NFSD_ATTR_RPC_STATUS_DPORT,
> +	NFSD_ATTR_RPC_STATUS_COMPOND_OP,
> +
> +	__NFSD_ATTR_RPC_STATUS_MAX,
> +	NFSD_ATTR_RPC_STATUS_MAX = (__NFSD_ATTR_RPC_STATUS_MAX - 1)
> +};
> +
> +enum nfsd_commands {
> +	NFSD_CMD_UNSPEC,
> +	NFSD_CMD_RPC_STATUS_GET,
> +
> +	__NFSD_CMD_MAX,
> +	NFSD_CMD_MAX = (__NFSD_CMD_MAX - 1)
> +};
> +
> +#endif /* _UAPI_LINUX_NFSD_SERVER_H */


Acked-by: Jeff Layton <jlayton@kernel.org>
Jeff Layton Sept. 11, 2023, 7:43 p.m. UTC | #3
On Mon, 2023-09-11 at 14:49 +0200, Lorenzo Bianconi wrote:
> Introduce rpc_status netlink support for NFSD in order to dump pending
> RPC requests debugging information from userspace.
> 
> Tested-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  fs/nfsd/nfsctl.c           | 192 ++++++++++++++++++++++++++++++++++++-
>  fs/nfsd/nfsd.h             |  16 ++++
>  fs/nfsd/nfssvc.c           |  15 +++
>  fs/nfsd/state.h            |   2 -
>  include/linux/sunrpc/svc.h |   1 +
>  5 files changed, 222 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 1be66088849c..b862a759ea15 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -26,6 +26,7 @@
>  #include "pnfs.h"
>  #include "filecache.h"
>  #include "trace.h"
> +#include "nfs_netlink_gen.h"
>  
>  /*
>   *	We have a single directory with several nodes in it.
> @@ -1497,17 +1498,199 @@ unsigned int nfsd_net_id;
>  
>  int nfsd_server_nl_rpc_status_get_start(struct netlink_callback *cb)
>  {
> -	return 0;
> +	struct nfsd_net *nn = net_generic(sock_net(cb->skb->sk), nfsd_net_id);
> +	int ret = -ENODEV;
> +
> +	mutex_lock(&nfsd_mutex);
> +	if (nn->nfsd_serv) {
> +		svc_get(nn->nfsd_serv);
> +		ret = 0;
> +	}
> +	mutex_unlock(&nfsd_mutex);
> +
> +	return ret;
>  }
>  
> -int nfsd_server_nl_rpc_status_get_done(struct netlink_callback *cb)
> +static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
> +					    struct netlink_callback *cb,
> +					    struct nfsd_genl_rqstp *rqstp)
>  {
> +	void *hdr;
> +	int i;
> +
> +	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
> +			  &nfsd_server_nl_family, NLM_F_MULTI,
> +			  NFSD_CMD_RPC_STATUS_GET);
> +	if (!hdr)
> +		return -ENOBUFS;
> +
> +	if (nla_put_be32(skb, NFSD_ATTR_RPC_STATUS_XID, rqstp->rq_xid) ||
> +	    nla_put_u32(skb, NFSD_ATTR_RPC_STATUS_FLAGS, rqstp->rq_flags) ||
> +	    nla_put_u32(skb, NFSD_ATTR_RPC_STATUS_PROG, rqstp->rq_prog) ||
> +	    nla_put_u32(skb, NFSD_ATTR_RPC_STATUS_PROC, rqstp->rq_proc) ||
> +	    nla_put_u8(skb, NFSD_ATTR_RPC_STATUS_VERSION, rqstp->rq_vers) ||
> +	    nla_put_s64(skb, NFSD_ATTR_RPC_STATUS_SERVICE_TIME,
> +			ktime_to_us(rqstp->rq_stime),
> +			NFSD_ATTR_RPC_STATUS_PAD))
> +		return -ENOBUFS;
> +
> +	switch (rqstp->saddr.sa_family) {
> +	case AF_INET: {
> +		const struct sockaddr_in *s_in, *d_in;
> +
> +		s_in = (const struct sockaddr_in *)&rqstp->saddr;
> +		d_in = (const struct sockaddr_in *)&rqstp->daddr;
> +		if (nla_put_in_addr(skb, NFSD_ATTR_RPC_STATUS_SADDR4,
> +				    s_in->sin_addr.s_addr) ||
> +		    nla_put_in_addr(skb, NFSD_ATTR_RPC_STATUS_DADDR4,
> +				    d_in->sin_addr.s_addr) ||
> +		    nla_put_be16(skb, NFSD_ATTR_RPC_STATUS_SPORT,
> +				 s_in->sin_port) ||
> +		    nla_put_be16(skb, NFSD_ATTR_RPC_STATUS_DPORT,
> +				 d_in->sin_port))
> +			return -ENOBUFS;
> +		break;
> +	}
> +	case AF_INET6: {
> +		const struct sockaddr_in6 *s_in, *d_in;
> +
> +		s_in = (const struct sockaddr_in6 *)&rqstp->saddr;
> +		d_in = (const struct sockaddr_in6 *)&rqstp->daddr;
> +		if (nla_put_in6_addr(skb, NFSD_ATTR_RPC_STATUS_SADDR6,
> +				     &s_in->sin6_addr) ||
> +		    nla_put_in6_addr(skb, NFSD_ATTR_RPC_STATUS_DADDR6,
> +				     &d_in->sin6_addr) ||
> +		    nla_put_be16(skb, NFSD_ATTR_RPC_STATUS_SPORT,
> +				 s_in->sin6_port) ||
> +		    nla_put_be16(skb, NFSD_ATTR_RPC_STATUS_DPORT,
> +				 d_in->sin6_port))
> +			return -ENOBUFS;
> +		break;
> +	}
> +	default:
> +		break;
> +	}
> +
> +	if (rqstp->opcnt) {

It may be that we always have an opcount of 0 whenever we're running
something besides NFSv4 COMPOUND, but it may be best not to count on
that. I'd test for prog == NFS_PROGRAM, vers == 4, and that the proc ==
COMPOUND and then for the opcnt.

> +		struct nlattr *attr;
> +
> +		attr = nla_nest_start(skb, NFSD_ATTR_RPC_STATUS_COMPOND_OP);
> +		if (!attr)
> +			return -ENOBUFS;
> +
> +		for (i = 0; i < rqstp->opcnt; i++) {
> +			struct nlattr *op_attr;
> +
> +			op_attr = nla_nest_start(skb, i);
> +			if (!op_attr)
> +				return -ENOBUFS;
> +
> +			if (nla_put_u32(skb, NFSD_ATTR_RPC_STATUS_COMP_OP,
> +					rqstp->opnum[i]))
> +				return -ENOBUFS;
> +
> +			nla_nest_end(skb, op_attr);
> +		}
> +
> +		nla_nest_end(skb, attr);
> +	}
> +
> +	genlmsg_end(skb, hdr);
> +
>  	return 0;
>  }
>  
>  int nfsd_server_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>  					 struct netlink_callback *cb)
>  {
> +	struct nfsd_net *nn = net_generic(sock_net(skb->sk), nfsd_net_id);
> +	int i, ret, rqstp_index;
> +
> +	rcu_read_lock();
> +
> +	for (i = 0; i < nn->nfsd_serv->sv_nrpools; i++) {
> +		struct svc_rqst *rqstp;
> +
> +		if (i < cb->args[0]) /* already consumed */
> +			continue;
> +
> +		rqstp_index = 0;
> +		list_for_each_entry_rcu(rqstp,
> +				&nn->nfsd_serv->sv_pools[i].sp_all_threads,
> +				rq_all) {
> +			struct nfsd_genl_rqstp genl_rqstp;
> +			unsigned int status_counter;
> +
> +			if (rqstp_index++ < cb->args[1]) /* already consumed */
> +				continue;
> +			/*
> +			 * Acquire rq_status_counter before parsing the rqst
> +			 * fields. rq_status_counter is set to an odd value in
> +			 * order to notify the consumers the rqstp fields are
> +			 * meaningful.
> +			 */
> +			status_counter =
> +				smp_load_acquire(&rqstp->rq_status_counter);
> +			if (!(status_counter & 1))
> +				continue;
> +
> +			genl_rqstp.rq_xid = rqstp->rq_xid;
> +			genl_rqstp.rq_flags = rqstp->rq_flags;
> +			genl_rqstp.rq_vers = rqstp->rq_vers;
> +			genl_rqstp.rq_prog = rqstp->rq_prog;
> +			genl_rqstp.rq_proc = rqstp->rq_proc;
> +			genl_rqstp.rq_stime = rqstp->rq_stime;
> +			genl_rqstp.opcnt = 0;
> +			memcpy(&genl_rqstp.daddr, svc_daddr(rqstp),
> +			       sizeof(struct sockaddr));
> +			memcpy(&genl_rqstp.saddr, svc_addr(rqstp),
> +			       sizeof(struct sockaddr));
> +
> +#ifdef CONFIG_NFSD_V4
> +			if (rqstp->rq_vers == NFS4_VERSION &&
> +			    rqstp->rq_proc == NFSPROC4_COMPOUND) {
> +				/* NFSv4 compund */
> +				struct nfsd4_compoundargs *args;
> +				int j;
> +
> +				args = rqstp->rq_argp;
> +				genl_rqstp.opcnt = args->opcnt;
> +				for (j = 0; j < genl_rqstp.opcnt; j++)
> +					genl_rqstp.opnum[j] =
> +						args->ops[j].opnum;
> +			}
> +#endif /* CONFIG_NFSD_V4 */
> +
> +			/*
> +			 * Acquire rq_status_counter before reporting the rqst
> +			 * fields to the user.
> +			 */
> +			if (smp_load_acquire(&rqstp->rq_status_counter) !=
> +			    status_counter)
> +				continue;
> +
> +			ret = nfsd_genl_rpc_status_compose_msg(skb, cb,
> +							       &genl_rqstp);
> +			if (ret)
> +				goto out;
> +		}
> +	}
> +
> +	cb->args[0] = i;
> +	cb->args[1] = rqstp_index;
> +	ret = skb->len;
> +out:
> +	rcu_read_unlock();
> +
> +	return ret;
> +}
> +
> +int nfsd_server_nl_rpc_status_get_done(struct netlink_callback *cb)
> +{
> +	mutex_lock(&nfsd_mutex);
> +	nfsd_put(sock_net(cb->skb->sk));
> +	mutex_unlock(&nfsd_mutex);
> +
>  	return 0;
>  }
>  
> @@ -1605,6 +1788,10 @@ static int __init init_nfsd(void)
>  	retval = register_filesystem(&nfsd_fs_type);
>  	if (retval)
>  		goto out_free_all;
> +	retval = genl_register_family(&nfsd_server_nl_family);
> +	if (retval)
> +		goto out_free_all;
> +
>  	return 0;
>  out_free_all:
>  	nfsd4_destroy_laundry_wq();
> @@ -1629,6 +1816,7 @@ static int __init init_nfsd(void)
>  
>  static void __exit exit_nfsd(void)
>  {
> +	genl_unregister_family(&nfsd_server_nl_family);
>  	unregister_filesystem(&nfsd_fs_type);
>  	nfsd4_destroy_laundry_wq();
>  	unregister_cld_notifier();
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 11c14faa6c67..d787bd38c053 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -62,6 +62,22 @@ struct readdir_cd {
>  	__be32			err;	/* 0, nfserr, or nfserr_eof */
>  };
>  
> +/* Maximum number of operations per session compound */
> +#define NFSD_MAX_OPS_PER_COMPOUND	50
> +
> +struct nfsd_genl_rqstp {
> +	struct sockaddr daddr;
> +	struct sockaddr saddr;
> +	unsigned long rq_flags;
> +	ktime_t rq_stime;
> +	__be32 rq_xid;
> +	u32 rq_vers;
> +	u32 rq_prog;
> +	u32 rq_proc;
> +	/* NFSv4 compund */
> +	u32 opnum[NFSD_MAX_OPS_PER_COMPOUND];
> +	u16 opcnt;
> +};
>  

Again, I'm wondering, is there a way to pass down some sort context-
specific value with netlink? It might be nice to make the two NFSv4
specific fields into a union if that can be done with netlink.

>  extern struct svc_program	nfsd_program;
>  extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 1582af33e204..fad34a7325b3 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -998,6 +998,15 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	if (!proc->pc_decode(rqstp, &rqstp->rq_arg_stream))
>  		goto out_decode_err;
>  
> +	/*
> +	 * Release rq_status_counter setting it to an odd value after the rpc
> +	 * request has been properly parsed. rq_status_counter is used to
> +	 * notify the consumers if the rqstp fields are stable
> +	 * (rq_status_counter is odd) or not meaningful (rq_status_counter
> +	 * is even).
> +	 */
> +	smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter | 1);
> +
>  	rp = NULL;
>  	switch (nfsd_cache_lookup(rqstp, &rp)) {
>  	case RC_DOIT:
> @@ -1015,6 +1024,12 @@ int nfsd_dispatch(struct svc_rqst *rqstp)
>  	if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream))
>  		goto out_encode_err;
>  
> +	/*
> +	 * Release rq_status_counter setting it to an even value after the rpc
> +	 * request has been properly processed.
> +	 */
> +	smp_store_release(&rqstp->rq_status_counter, rqstp->rq_status_counter + 1);
> +
>  	nfsd_cache_update(rqstp, rp, rqstp->rq_cachetype, statp + 1);
>  out_cached_reply:
>  	return 1;
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index cbddcf484dba..41bdc913fa71 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -174,8 +174,6 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
>  
>  /* Maximum number of slots per session. 160 is useful for long haul TCP */
>  #define NFSD_MAX_SLOTS_PER_SESSION     160
> -/* Maximum number of operations per session compound */
> -#define NFSD_MAX_OPS_PER_COMPOUND	50
>  /* Maximum  session per slot cache size */
>  #define NFSD_SLOT_CACHE_SIZE		2048
>  /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index dbf5b21feafe..caa20defd255 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -251,6 +251,7 @@ struct svc_rqst {
>  						 * net namespace
>  						 */
>  	void **			rq_lease_breaker; /* The v4 client breaking a lease */
> +	unsigned int		rq_status_counter; /* RPC processing counter */
>  };
>  
>  /* bits for rq_flags */
Lorenzo Bianconi Sept. 14, 2023, 10:46 a.m. UTC | #4
> On Mon, Sep 11, 2023 at 02:49:44PM +0200, Lorenzo Bianconi wrote:
> > Introduce nfsd_server.yaml specs to generate uAPI and netlink
> > code for nfsd server.
> > Add rpc-status specs to define message reported by the nfsd server
> > dumping the pending RPC requests.
> > 
> > Tested-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  Documentation/netlink/specs/nfsd_server.yaml | 97 ++++++++++++++++++++
> >  1 file changed, 97 insertions(+)
> >  create mode 100644 Documentation/netlink/specs/nfsd_server.yaml
> 
> I've had a look... the series is simple and short. Thanks!
> 
> My only quibbles right now are cosmetic and naming-related, all
> of which can be addressed when I apply these. So I'm going to
> wait for other review comments to see if we need another version
> or whether I can apply v8 with by-hand clean-ups.
> 
> Comments below are what I might change when applying this one.
> This is not (yet) a request for a new version.

Hi Chuck,

thx for the review. Please let me know if I need to post a v9.

> 
> 
> > diff --git a/Documentation/netlink/specs/nfsd_server.yaml b/Documentation/netlink/specs/nfsd_server.yaml
> > new file mode 100644
> > index 000000000000..e681b493847b
> > --- /dev/null
> > +++ b/Documentation/netlink/specs/nfsd_server.yaml
> > @@ -0,0 +1,97 @@
> > +# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
> > +
> > +name: nfsd_server
> 
> IMHO "nfsd_server" is redundant. "nfsd" should work.

ack, fine

> 
> 
> > +
> > +doc:
> > +  nfsd server configuration over generic netlink.
> > +
> > +attribute-sets:
> > +  -
> > +    name: rpc-status-comp-op-attr
> > +    enum-name: nfsd-rpc-status-comp-attr
> > +    name-prefix: nfsd-attr-rpc-status-comp-
> > +    attributes:
> > +      -
> > +        name: unspec
> > +        type: unused
> > +        value: 0
> 
> I don't recall whether a zero-value definition is explicitly
> necessary. Maybe "value-start: 1" would work rather than
> these three lines? Why is zero a special attribute value?

I do not think it is mandatory, I added them just to be aligned with other
netlink definitions, but we do not use them.

> 
> 
> > +      -
> > +        name: op
> > +        type: u32
> > +  -
> > +    name: rpc-status-attr
> > +    enum-name: nfsd-rpc-status-attr
> > +    name-prefix: nfsd-attr-rpc-status-
> 
> Specifying all three of these name settings seems a bit
> cluttered.

ack, I would say we can get rid of enum-name, agree?

> 
> 
> > +    attributes:
> > +      -
> > +        name: unspec
> > +        type: unused
> > +        value: 0
> > +      -
> > +        name: xid
> > +        type: u32
> > +        byte-order: big-endian
> > +      -
> > +        name: flags
> > +        type: u32
> > +      -
> > +        name: prog
> > +        type: u32
> > +      -
> > +        name: version
> > +        type: u8
> > +      -
> > +        name: proc
> > +        type: u32
> > +      -
> > +        name: service_time
> > +        type: s64
> > +      -
> > +        name: pad
> > +        type: pad
> > +      -
> > +        name: saddr4
> > +        type: u32
> > +        byte-order: big-endian
> > +        display-hint: ipv4
> > +      -
> > +        name: daddr4
> > +        type: u32
> > +        byte-order: big-endian
> > +        display-hint: ipv4
> > +      -
> > +        name: saddr6
> > +        type: binary
> > +        display-hint: ipv6
> > +      -
> > +        name: daddr6
> > +        type: binary
> > +        display-hint: ipv6
> > +      -
> > +        name: sport
> > +        type: u16
> > +        byte-order: big-endian
> > +      -
> > +        name: dport
> > +        type: u16
> > +        byte-order: big-endian
> > +      -
> > +        name: compond-op
> 
> s/compond-op/compound-op

ack

> 
> > +        type: array-nest
> > +        nested-attributes: rpc-status-comp-op-attr
> 
> So, this is supposed to be a counted array of op numbers? Is there
> an existing type that could be used for this instead?

I think the attribute-sets available types are defined here:

https://github.com/torvalds/linux/blob/master/Documentation/netlink/genetlink-c.yaml#L151

Regards,
Lorenzo

> 
> 
> > +
> > +operations:
> > +  enum-name: nfsd-commands
> > +  name-prefix: nfsd-cmd-
> > +  list:
> > +    -
> > +      name: unspec
> > +      doc: unused
> > +      value: 0
> > +    -
> > +      name: rpc-status-get
> > +      doc: dump pending nfsd rpc
> > +      attribute-set: rpc-status-attr
> > +      dump:
> > +        pre: nfsd-server-nl-rpc-status-get-start
> > +        post: nfsd-server-nl-rpc-status-get-done
> 
> -- 
> Chuck Lever
>
Chuck Lever Sept. 15, 2023, 7:24 p.m. UTC | #5
On Mon, Sep 11, 2023 at 02:49:43PM +0200, Lorenzo Bianconi wrote:
> Introduce rpc_status netlink support for NFSD in order to dump pending
> RPC requests debugging information from userspace.
> The new code can be tested with the user-space tool reported below:
> https://github.com/LorenzoBianconi/nfsd-rpc-netlink-monitor/tree/main
> 
> Changes since v7:
> - introduce nfsd_server.yaml for netlink messages
> 
> Changes since v6:
> - report info to user-space through netlink and get rid of the related
>   entry in the procfs
> 
> Changes since v5:
> - add missing delimiters for nfs4 compound ops
> - add a header line for rpc_status handler
> - do not dump rq_prog
> - do not dump rq_flags in hex
> 
> Changes since v4:
> - rely on acquire/release APIs and get rid of atomic operation
> - fix kdoc for nfsd_rpc_status_open
> - get rid of ',' as field delimiter in nfsd_rpc_status hanlder
> - move nfsd_rpc_status before nfsd_v4 enum entries
> - fix compilantion error if nfsdv4 is not enabled
> 
> Changes since v3:
> - introduce rq_status_counter in order to detect if the RPC request is
>   pending and RPC info are stable
> - rely on __svc_print_addr to dump IP info
> 
> Changes since v2:
> - minor changes in nfsd_rpc_status_show output
> 
> Changes since v1:
> - rework nfsd_rpc_status_show output
> 
> Changes since RFCv1:
> - riduce time holding nfsd_mutex bumping svc_serv refcoung in
>   nfsd_rpc_status_open()
> - dump rqstp->rq_stime
> - add missing kdoc for nfsd_rpc_status_open()
> 
> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D3D3D366
> 
> Lorenzo Bianconi (3):
>   Documentation: netlink: add a YAML spec for nfsd_server
>   NFSD: introduce netlink rpc_status stubs
>   NFSD: add rpc_status netlink support
> 
>  Documentation/netlink/specs/nfsd_server.yaml |  97 +++++++++
>  fs/nfsd/Makefile                             |   3 +-
>  fs/nfsd/nfs_netlink_gen.c                    |  32 +++
>  fs/nfsd/nfs_netlink_gen.h                    |  22 ++
>  fs/nfsd/nfsctl.c                             | 204 +++++++++++++++++++
>  fs/nfsd/nfsd.h                               |  16 ++
>  fs/nfsd/nfssvc.c                             |  15 ++
>  fs/nfsd/state.h                              |   2 -
>  include/linux/sunrpc/svc.h                   |   1 +
>  include/uapi/linux/nfsd_server.h             |  49 +++++
>  10 files changed, 438 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/netlink/specs/nfsd_server.yaml
>  create mode 100644 fs/nfsd/nfs_netlink_gen.c
>  create mode 100644 fs/nfsd/nfs_netlink_gen.h
>  create mode 100644 include/uapi/linux/nfsd_server.h

Hi Lorenzo -

I've applied these three to nfsd-next with the following changes:

- Renaming as we discussed
- Replaced the nested compound_op attribute -- may require some user
  space tooling changes
- Simon's Smatch bug fixed
- Squashed 1/3 and 2/3 into one patch
- Added Closes/Acked-by etc

If you spot a bug, send patches against nfsd-next and I can squash
them in.

I was wondering if you have a little more time to try adding one or
two control cmds. write_threads and v4_end_grace might be simple
ones to start with. No problem if you are "done" with this project,
I can add these over time.
Lorenzo Bianconi Sept. 15, 2023, 9:30 p.m. UTC | #6
[...]
> >  Documentation/netlink/specs/nfsd_server.yaml |  97 +++++++++
> >  fs/nfsd/Makefile                             |   3 +-
> >  fs/nfsd/nfs_netlink_gen.c                    |  32 +++
> >  fs/nfsd/nfs_netlink_gen.h                    |  22 ++
> >  fs/nfsd/nfsctl.c                             | 204 +++++++++++++++++++
> >  fs/nfsd/nfsd.h                               |  16 ++
> >  fs/nfsd/nfssvc.c                             |  15 ++
> >  fs/nfsd/state.h                              |   2 -
> >  include/linux/sunrpc/svc.h                   |   1 +
> >  include/uapi/linux/nfsd_server.h             |  49 +++++
> >  10 files changed, 438 insertions(+), 3 deletions(-)
> >  create mode 100644 Documentation/netlink/specs/nfsd_server.yaml
> >  create mode 100644 fs/nfsd/nfs_netlink_gen.c
> >  create mode 100644 fs/nfsd/nfs_netlink_gen.h
> >  create mode 100644 include/uapi/linux/nfsd_server.h
> 
> Hi Lorenzo -
> 
> I've applied these three to nfsd-next with the following changes:
> 
> - Renaming as we discussed
> - Replaced the nested compound_op attribute -- may require some user
>   space tooling changes
> - Simon's Smatch bug fixed
> - Squashed 1/3 and 2/3 into one patch
> - Added Closes/Acked-by etc

Hi Chuck,

Thanks for addressing the points above.

> 
> If you spot a bug, send patches against nfsd-next and I can squash
> them in.

ack, I will do

> 
> I was wondering if you have a little more time to try adding one or
> two control cmds. write_threads and v4_end_grace might be simple
> ones to start with. No problem if you are "done" with this project,
> I can add these over time.

sure, no worries. I will look into them soon :)

Regards,
Lorenzo

> 
> -- 
> Chuck Lever