Message ID | 6431d0ea2295a1e128f83cd76a419dee161e4c44.1691482815.git.lorenzo@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFSD: add version field to nfsd_rpc_status_show handler | expand |
On Tue, Aug 08, 2023 at 10:20:44AM -0400, Jeff Layton wrote: > On Tue, 2023-08-08 at 10:03 -0400, Chuck Lever wrote: > > On Tue, Aug 08, 2023 at 09:48:42AM -0400, Jeff Layton wrote: > > > On Tue, 2023-08-08 at 09:24 -0400, Chuck Lever wrote: > > > > On Tue, Aug 08, 2023 at 09:33:23PM +1000, NeilBrown wrote: > > > > > On Tue, 08 Aug 2023, Lorenzo Bianconi wrote: > > > > > > Introduce version field to nfsd_rpc_status handler in order to help > > > > > > the user to maintain backward compatibility. > > > > > > > > > > I wonder if this really helps. What do I do if I see a version that I > > > > > don't understand? Ignore the whole file? That doesn't make for a good > > > > > user experience. > > > > > > > > There is no UX consideration here. A user browsing the file directly > > > > will not care about the version. > > > > > > > > This file is intended to be parsable by scripts and they have to > > > > keep up with the occasional changes in format. Scripts can handle an > > > > unrecogized version however they like. > > > > > > > > This is what we typically get with a made-up format that isn't .ini > > > > or JSON or XML. The file format isn't self-documenting. The final > > > > field on each row is a variable number of tokens, so it will be > > > > nearly impossible to simply add another field without breaking > > > > something. > > > > > > > > > > It shouldn't be a variable number of tokens per line. > > > > That's how NFSv4 COMPOUND operations are displayed. For example: > > > > 0x5d58666f 0x000000d1 0x000186a3 NFSv4 COMPOUND 0000062034739371 192.168.103.67 0 192.168.103.56 20049 OP_SEQUENCE OP_PUTFH OP_READ > > > > The list of operations in the displayed compound are currently > > blank-separated tokens at the end of each row. > > > > Oh! That's a bug in missed in my latest review then. The operations > field was delimited by ':' chars at one point. Lorenzo, did you mean to > change that? > > IMO, the list of operations should be one field, separated by a distinct > delimiter (like ':'). > > > > > > If there is, then that's a bug, IMO. We do want it to be simple to > > > just add a new field, published version info notwithstanding. > > > > They could be wrapped in curly braces, or separated by commas, to > > make them all one token. > > > > I haven't looked at NFSv3 output yet, but I expect those extra > > tokens won't even be there in that case. > > > > That's probably another bug. Anything not a v4 COMPOUND should have > something as a placeholder. It could just be a single '-' character. > > > JSON, yaml, or xml would all address the extensibility problem, just > > as an alternative thought. > > > > It would probably be fairly simple to output well-formed yaml instead. > JSON and XML are a bit more of a pain. If folks don't mind, I would like more structured output like one of these self-documenting formats. (I know I said I didn't care before, but I'm beginning to care now ;-) I'm also wondering if we really ought not add another file under /proc, which is essentially obsolete. Would /sys/fs/nfsd/yada be better for this facility? I hesitate to even mention network namespaces... > For now, we can change the output. We do need to have this settled > before this goes to Linus' tree though. Agreed. As fair warning, I might drop this from v6.6 if we need more time to get it right. That doesn't mean I'm not excited about having this facility available for all our users.
On Wed, Aug 09, 2023 at 07:45:06AM +1000, NeilBrown wrote: > On Wed, 09 Aug 2023, Chuck Lever wrote: > > On Tue, Aug 08, 2023 at 10:20:44AM -0400, Jeff Layton wrote: > > > > > > It would probably be fairly simple to output well-formed yaml instead. > > > JSON and XML are a bit more of a pain. > > > > If folks don't mind, I would like more structured output like one of > > these self-documenting formats. (I know I said I didn't care before, > > but I'm beginning to care now ;-) > > Lustre, which I am somewhat involved with, uses YAML for various things. > If someone else introduced yaml-producing sysfs files to the kernel > first, that might make the path for lustre smoother :-) It worries me that there isn't yet kernel infrastructure for formating yaml in sysfs files. That broadens the scope of this work significantly. > Another option is netlink which lustre is stating to use for > configuration and stats. It is a self-describing format. The code > looks verbose, but it is widely used in the kernel and so well supported. I just spent the last 6 months building a netlink upcall to handle TLS handshake requests for in-kernel TLS consumers. It is built on the recently-added yaml netlink specs and code generator. The yaml netlink specs are kept under: Documentation/netlink/specs/ Using netlink would give us a lot of infrastructure for this facility, but I'm not sure it's worth the extra complexity. And it would /require/ the use of user space tooling (ie, not 'cat') to get to the information exported from the kernel. <shrug> > > I'm also wondering if we really ought not add another file under > > /proc, which is essentially obsolete. Would /sys/fs/nfsd/yada be > > better for this facility? > > It is only under /proc because that is where it is mounted by default :-) > I think it might be sensible to create a node under /sys where all the > content of the nfsd filesystem also appears. There are things in the nfsd filesystem that really belong under /proc/net/rpc or elsewhere, so IMO such migration needs to be handled on a case-by-case basis -- different project for another time. > I'm not keen on /sys/fs/nfsd because nfsd isn't a filesystem, it is a > service. How about /sys/module/nfsd ? > > I hesitate to even mention network namespaces... > > Please do mention them - I find them too easy to forget about. > /proc/fs/nfsd/ inherits the network namespace from whoever mounts it. > So this can work perfectly. > If we created a mirror in /sys/ we would presumably use the namespace of > the process that opens the file. I agree: the network namespace of the process that opens the rpc_status file is just what we want to limit access to in-flight requests. The current network namespace of each thread is available via SVC_NET(rqst), so it should be quite simple to display only in-flight requests that match the opener's namespace.
On Wed, 09 Aug 2023, Chuck Lever wrote: > On Wed, Aug 09, 2023 at 07:45:06AM +1000, NeilBrown wrote: > > On Wed, 09 Aug 2023, Chuck Lever wrote: > > > On Tue, Aug 08, 2023 at 10:20:44AM -0400, Jeff Layton wrote: > > > > > > > > It would probably be fairly simple to output well-formed yaml instead. > > > > JSON and XML are a bit more of a pain. > > > > > > If folks don't mind, I would like more structured output like one of > > > these self-documenting formats. (I know I said I didn't care before, > > > but I'm beginning to care now ;-) > > > > Lustre, which I am somewhat involved with, uses YAML for various things. > > If someone else introduced yaml-producing sysfs files to the kernel > > first, that might make the path for lustre smoother :-) > > It worries me that there isn't yet kernel infrastructure for > formating yaml in sysfs files. That broadens the scope of this > work significantly. > > > > Another option is netlink which lustre is stating to use for > > configuration and stats. It is a self-describing format. The code > > looks verbose, but it is widely used in the kernel and so well supported. > > I just spent the last 6 months building a netlink upcall to handle > TLS handshake requests for in-kernel TLS consumers. It is built on > the recently-added yaml netlink specs and code generator. The yaml > netlink specs are kept under: > > Documentation/netlink/specs/ > > Using netlink would give us a lot of infrastructure for this > facility, but I'm not sure it's worth the extra complexity. And it > would /require/ the use of user space tooling (ie, not 'cat') to get > to the information exported from the kernel. <shrug> > I do like the "cat" approach. Unfortunately it doesn't scale and you never really know when it needs to scale. The nfsd/rpc cache.c auth cache is still a sore point for me. It works nicely expect that it breaks for gss because the keys get too big. So we've had a couple of attempts to "fix" that. The fixes work, but they are *different*. The other well known pain point is /proc/mounts. It is really cool that you can "cat" that, but with thousands of mounts it can take tools like systemd a long time to find changes. does any of that matter for collecting stats? Will we hit a wall? I really don't know. I'd like to think that we won't.... > > > > I'm also wondering if we really ought not add another file under > > > /proc, which is essentially obsolete. Would /sys/fs/nfsd/yada be > > > better for this facility? > > > > It is only under /proc because that is where it is mounted by default :-) > > I think it might be sensible to create a node under /sys where all the > > content of the nfsd filesystem also appears. > > There are things in the nfsd filesystem that really belong under > /proc/net/rpc or elsewhere, so IMO such migration needs to be > handled on a case-by-case basis -- different project for another > time. abolutely. > > > > I'm not keen on /sys/fs/nfsd because nfsd isn't a filesystem, it is a > > service. > > How about /sys/module/nfsd ? Not worse than /sys/fs/nfsd. Not really better though. Everything in /sys/module/foo is about the module as a chunk of code - except "parameters". I guess we could add "state". Maybe configfs is the thing ... but I never liked configfs. It seems like a solution in search of a problem. I complained that /sys/fs is like the provfs-v2. It is more that everything other than devices (and block,bus,class) is procfs-v2. There are little bits of regularity, but no big-picture. I would probably argue for /sys/services/sunrpc/{nfsd,lockd,nfsv4.1-cb} Alternately, we could go for /sys/devices/virtual/sunrpc. The virtual directory contains "workqueue" which is a service in some sense, and contains subdirectories for specific work-queues. I'm not sure that all of the nfsd stuff would fit under there... but maybe I'm over-thinking things. Thanks, NeilBrown > > > > > I hesitate to even mention network namespaces... > > > > Please do mention them - I find them too easy to forget about. > > /proc/fs/nfsd/ inherits the network namespace from whoever mounts it. > > So this can work perfectly. > > If we created a mirror in /sys/ we would presumably use the namespace of > > the process that opens the file. > > I agree: the network namespace of the process that opens the > rpc_status file is just what we want to limit access to in-flight > requests. The current network namespace of each thread is available > via SVC_NET(rqst), so it should be quite simple to display only > in-flight requests that match the opener's namespace. > > > -- > Chuck Lever >
> On Tue, Aug 08, 2023 at 10:20:44AM -0400, Jeff Layton wrote: > > On Tue, 2023-08-08 at 10:03 -0400, Chuck Lever wrote: > > > On Tue, Aug 08, 2023 at 09:48:42AM -0400, Jeff Layton wrote: > > > > On Tue, 2023-08-08 at 09:24 -0400, Chuck Lever wrote: > > > > > On Tue, Aug 08, 2023 at 09:33:23PM +1000, NeilBrown wrote: > > > > > > On Tue, 08 Aug 2023, Lorenzo Bianconi wrote: > > > > > > > Introduce version field to nfsd_rpc_status handler in order to help > > > > > > > the user to maintain backward compatibility. > > > > > > > > > > > > I wonder if this really helps. What do I do if I see a version that I > > > > > > don't understand? Ignore the whole file? That doesn't make for a good > > > > > > user experience. > > > > > > > > > > There is no UX consideration here. A user browsing the file directly > > > > > will not care about the version. > > > > > > > > > > This file is intended to be parsable by scripts and they have to > > > > > keep up with the occasional changes in format. Scripts can handle an > > > > > unrecogized version however they like. > > > > > > > > > > This is what we typically get with a made-up format that isn't .ini > > > > > or JSON or XML. The file format isn't self-documenting. The final > > > > > field on each row is a variable number of tokens, so it will be > > > > > nearly impossible to simply add another field without breaking > > > > > something. > > > > > > > > > > > > > It shouldn't be a variable number of tokens per line. > > > > > > That's how NFSv4 COMPOUND operations are displayed. For example: > > > > > > 0x5d58666f 0x000000d1 0x000186a3 NFSv4 COMPOUND 0000062034739371 192.168.103.67 0 192.168.103.56 20049 OP_SEQUENCE OP_PUTFH OP_READ > > > > > > The list of operations in the displayed compound are currently > > > blank-separated tokens at the end of each row. > > > > > > > Oh! That's a bug in missed in my latest review then. The operations > > field was delimited by ':' chars at one point. Lorenzo, did you mean to > > change that? > > > > IMO, the list of operations should be one field, separated by a distinct > > delimiter (like ':'). > > > > > > > > > If there is, then that's a bug, IMO. We do want it to be simple to > > > > just add a new field, published version info notwithstanding. > > > > > > They could be wrapped in curly braces, or separated by commas, to > > > make them all one token. > > > > > > I haven't looked at NFSv3 output yet, but I expect those extra > > > tokens won't even be there in that case. > > > > > > > That's probably another bug. Anything not a v4 COMPOUND should have > > something as a placeholder. It could just be a single '-' character. > > Confirmed, rows reporting NFSv3 procedures have nothing on the end. > > I'll also note that rq_prog and the "NFSv" string are problematic. > Is it the case that all RPCs handled in this thread pool are going > to be NFS requests? > > If we expect non-NFS requests to be handled in this thread pool > (like svc_wake_up or NFSACL) then the loop should simply skip > threads whose rq_prog != NFS_PROGRAM. > > And, if the rpc_status file is supposed to display only NFS > requests (and I believe the answer to that is yes), then let's drop > the rq_prog field, since it will always show the same value. ack, I will fix it. > > > > > JSON, yaml, or xml would all address the extensibility problem, just > > > as an alternative thought. > > > > > > > It would probably be fairly simple to output well-formed yaml instead. > > JSON and XML are a bit more of a pain. > > > > For now, we can change the output. We do need to have this settled > > before this goes to Linus' tree though. > > Lorenzo, I'll drop the v5 of this series from nfsd-next. When you're > ready, please send another version with the discussed changes > squashed in. ack, fine to me. Just a couple of questions: - do we want to expose the output in yaml or is it enough to fix the NFSv4 COMPOUND parsing using ":" as sub-delimiter (and add a placeholder for non NFSv4 COMPOUND)? The yaml approach downside is we will need to add some specific code since afaik there isn't any yaml code we can rely on in the kernel, right? - what about netlink? I would say we can have both of them (cat + netlink) so the user does not need to have a specific userspace tool to decode the info. I will work on v6 as soon as we have agreed on the points above. Regards, Lorenzo > > > -- > Chuck Lever
On Wed, Aug 09, 2023 at 09:49:55AM +0200, Lorenzo Bianconi wrote: > > Lorenzo, I'll drop the v5 of this series from nfsd-next. When you're > > ready, please send another version with the discussed changes > > squashed in. > > ack, fine to me. Just a couple of questions: > - do we want to expose the output in yaml or is it enough to fix the NFSv4 COMPOUND > parsing using ":" as sub-delimiter (and add a placeholder for non NFSv4 COMPOUND)? > The yaml approach downside is we will need to add some specific code since afaik > there isn't any yaml code we can rely on in the kernel, right? Would you mind spinning a series with the simple delimiter changes and the other things we've discussed so far? It seems we have some items that still need sorting before tackling netlink v. sysfs. > - what about netlink? I would say we can have both of them (cat + netlink) so > the user does not need to have a specific userspace tool to decode the info. The trend in network subsystems is to use netlink and a purpose- built tool, no "cat" support. The trouble with "cat" is we can't seem to decide on where to put the output file. Also, I notice that rq_flags appears for each in-flight request in raw hexadecimal form. That's not especially user-friendly and cries out for a tool to interpret the bits in that field. (Actually IIRC there is now a tool that can take a yaml-defined netlink protocol and perform each of the protocol's operations and spit out the raw results). I'm wondering if having support for "cat" is really just an old habit we need to discard. Re: netlink... folks should keep in mind that the output would not be yaml. netlink uses yaml to define the netlink protocol, which is quite like SunRPC. This still meets our extensibility requirements, IMO. Creating a netlink protocol would provide a vehicle for exporting other information. Once there is an NFSD-specific netlink protocol, it's straightforward to define an additional netlink procedure for any of the information that are now available in /proc/fs/nfsd/ files. > I will work on v6 as soon as we have agreed on the points above. Thanks, I appreciate it!
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 33ad91dd3a2d..6d5feeeb09a7 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1117,6 +1117,9 @@ int nfsd_stats_release(struct inode *inode, struct file *file) return ret; } +/* Increment NFSD_RPC_STATUS_VERSION adding new info to the handler */ +#define NFSD_RPC_STATUS_VERSION 1 + static int nfsd_rpc_status_show(struct seq_file *m, void *v) { struct inode *inode = file_inode(m->file); @@ -1125,6 +1128,8 @@ static int nfsd_rpc_status_show(struct seq_file *m, void *v) rcu_read_lock(); + seq_printf(m, "# version %u\n", NFSD_RPC_STATUS_VERSION); + for (i = 0; i < nn->nfsd_serv->sv_nrpools; i++) { struct svc_rqst *rqstp;
Introduce version field to nfsd_rpc_status handler in order to help the user to maintain backward compatibility. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> --- fs/nfsd/nfssvc.c | 5 +++++ 1 file changed, 5 insertions(+)