Message ID | 158204549488.3299825.3783690177353088425.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | VFS: Filesystem information and notifications [ver #16] | expand |
Hi David, I have a few generic remarks for new syscalls... > (3) New single-bit capability flags can be added. This is a structure-typed > attribute and, as such, (2) applies. Any bits you wanted but the kernel > doesn't support are automatically set to 0. > > fsinfo() may be called like the following, for example: > > struct fsinfo_params params = { > .at_flags = AT_SYMLINK_NOFOLLOW, Shouldn't all new syscalls be able to provide the RESOLVE_ Shouldn't all new syscalls be able to provide the RESOLVE_ flags supported in openat2? > .flags = FSINFO_FLAGS_QUERY_PATH, > .request = FSINFO_ATTR_AFS_SERVER_ADDRESSES, > .Nth = 2, > }; > struct fsinfo_server_address address; > len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", ¶ms, > &address, sizeof(address)); Also passing sizeof(params) would allow future updates of fsinfo_params, also similar to openat2(), clone3()... > ======================== > FILESYSTEM NOTIFICATIONS > ======================== > > The second system call, watch_mount(), places a watch on a point in the > mount topology specified by the dirfd, path and at_flags parameters. All > mount topology change and mount attribute change notifications in the > subtree rooted at that point can be intercepted by the watch. Watches are > ducted through pipes: > > int fd[2]; > pipe2(fd, O_NOTIFICATION_PIPE); > ioctl(fd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); > watch_mount(AT_FDCWD, "/", 0, fd[0], 0x02); I guess similar things apply here. Does that make sense to you? metze
On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote: > > Here are a set of patches that adds system calls, that (a) allow > information about the VFS, mount topology, superblock and files to be > retrieved and (b) allow for notifications of mount topology rearrangement > events, mount and superblock attribute changes and other superblock events, > such as errors. > > ============================ > FILESYSTEM INFORMATION QUERY > ============================ > > The first system call, fsinfo(), allows information about the filesystem at > a particular path point to be queried as a set of attributes, some of which > may have more than one value. > > Attribute values are of four basic types: > > (1) Version dependent-length structure (size defined by type). > > (2) Variable-length string (up to 4096, including NUL). > > (3) List of structures (up to INT_MAX size). > > (4) Opaque blob (up to INT_MAX size). I mainly have an organizational question. :) This is a huge patchset with lots and lots of (good) features. Wouldn't it make sense to make the fsinfo() syscall a completely separate patchset from the watch_mount() and watch_sb() syscalls? It seems that they don't need to depend on each other at all. This would make reviewing this so much nicer and likely would mean that fsinfo() could proceed a little faster. Christian
On Wed, Feb 19, 2020 at 03:46:13PM +0100, Christian Brauner wrote: > On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote: > > > > Here are a set of patches that adds system calls, that (a) allow > > information about the VFS, mount topology, superblock and files to be > > retrieved and (b) allow for notifications of mount topology rearrangement > > events, mount and superblock attribute changes and other superblock events, > > such as errors. > > > > ============================ > > FILESYSTEM INFORMATION QUERY > > ============================ > > > > The first system call, fsinfo(), allows information about the filesystem at > > a particular path point to be queried as a set of attributes, some of which > > may have more than one value. > > > > Attribute values are of four basic types: > > > > (1) Version dependent-length structure (size defined by type). > > > > (2) Variable-length string (up to 4096, including NUL). > > > > (3) List of structures (up to INT_MAX size). > > > > (4) Opaque blob (up to INT_MAX size). > > I mainly have an organizational question. :) This is a huge patchset > with lots and lots of (good) features. Wouldn't it make sense to make > the fsinfo() syscall a completely separate patchset from the > watch_mount() and watch_sb() syscalls? It seems that they don't need to > depend on each other at all. This would make reviewing this so much > nicer and likely would mean that fsinfo() could proceed a little faster. Agreed; I was also wondering why it was necessary to have three new features in the same large(ish) patchset. --D > Christian
Christian Brauner <christian.brauner@ubuntu.com> wrote: > I mainly have an organizational question. :) This is a huge patchset > with lots and lots of (good) features. Wouldn't it make sense to make > the fsinfo() syscall a completely separate patchset from the > watch_mount() and watch_sb() syscalls? It seems that they don't need to > depend on each other at all. This would make reviewing this so much > nicer and likely would mean that fsinfo() could proceed a little faster. I can split it up again, but it's not quite as independent as it seems. There's a notification counter added to both the mount struct and the super_block struct. This is bumped by notifications and retrieved by fsinfo(). You need this in the event that there's an overrun and you have to rescan the whole tree. So to actually make use of the mount/sb notification facilities, you need fsinfo() as well. David
On Wed, 2020-02-19 at 15:46 +0100, Christian Brauner wrote: > On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote: > > Here are a set of patches that adds system calls, that (a) allow > > information about the VFS, mount topology, superblock and files to > > be > > retrieved and (b) allow for notifications of mount topology > > rearrangement > > events, mount and superblock attribute changes and other superblock > > events, > > such as errors. > > > > ============================ > > FILESYSTEM INFORMATION QUERY > > ============================ > > > > The first system call, fsinfo(), allows information about the > > filesystem at > > a particular path point to be queried as a set of attributes, some > > of which > > may have more than one value. > > > > Attribute values are of four basic types: > > > > (1) Version dependent-length structure (size defined by type). > > > > (2) Variable-length string (up to 4096, including NUL). > > > > (3) List of structures (up to INT_MAX size). > > > > (4) Opaque blob (up to INT_MAX size). > > I mainly have an organizational question. :) This is a huge patchset > with lots and lots of (good) features. Wouldn't it make sense to make > the fsinfo() syscall a completely separate patchset from the > watch_mount() and watch_sb() syscalls? It seems that they don't need > to > depend on each other at all. This would make reviewing this so much > nicer and likely would mean that fsinfo() could proceed a little > faster. The remainder of the fsinfo() series would need to remain useful if this was done. For context I want work on improving handling of large mount tables. Ultimately I expect to solve a very long standing autofs problem of using large direct mount maps without prohibitive performance overhead (and there a lot of rather challenging autofs changes to do for this too) and I believe the fsinfo() system call, and related bits, is the way to do this. But improving the handling of large mount tables for autofs will have the side effect of improvements for other mount table users, even in the early stages of this work. For example I want to use this for mount table handling improvements in libmount. Clearly that ultimately needs mount change notification in the end but ... There's a bunch of things that need to be done alone the way to even get started. One thing that's needed is the ability to call fsinfo() to get information on a mount to avoid constant reading of the proc based mount table, which happens a lot (since the mount info. needs to be up to date) so systemd (and others) would see an improvement with the fsinfo() system call alone able to be used in libmount. But for the fsinfo() system call to be used for this the file system specific mount options need to also be obtained when using fsinfo(). That means the super block operation fsinfo uses to provide this must be implemented for at least most file systems. So separating out the notifications part, leaving whatever is needed to still be able to do this, should be fine and the system call would be immediately useful once the super operation is implemented for the needed file systems. Whether the implementation of the super operation should be done as part of this series is another question but would certainly be a challenge and make the series more complicated. But is needed for the change to be useful in my case. Ian
On Thu, Feb 20, 2020 at 12:42:15PM +0800, Ian Kent wrote: > On Wed, 2020-02-19 at 15:46 +0100, Christian Brauner wrote: > > On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote: > > > Here are a set of patches that adds system calls, that (a) allow > > > information about the VFS, mount topology, superblock and files to > > > be > > > retrieved and (b) allow for notifications of mount topology > > > rearrangement > > > events, mount and superblock attribute changes and other superblock > > > events, > > > such as errors. > > > > > > ============================ > > > FILESYSTEM INFORMATION QUERY > > > ============================ > > > > > > The first system call, fsinfo(), allows information about the > > > filesystem at > > > a particular path point to be queried as a set of attributes, some > > > of which > > > may have more than one value. > > > > > > Attribute values are of four basic types: > > > > > > (1) Version dependent-length structure (size defined by type). > > > > > > (2) Variable-length string (up to 4096, including NUL). > > > > > > (3) List of structures (up to INT_MAX size). > > > > > > (4) Opaque blob (up to INT_MAX size). > > > > I mainly have an organizational question. :) This is a huge patchset > > with lots and lots of (good) features. Wouldn't it make sense to make > > the fsinfo() syscall a completely separate patchset from the > > watch_mount() and watch_sb() syscalls? It seems that they don't need > > to > > depend on each other at all. This would make reviewing this so much > > nicer and likely would mean that fsinfo() could proceed a little > > faster. > > The remainder of the fsinfo() series would need to remain useful > if this was done. > > For context I want work on improving handling of large mount > tables. Yeah, I've talked to David about this; polling on a large mountinfo file is not great, I agree. > > Ultimately I expect to solve a very long standing autofs problem > of using large direct mount maps without prohibitive performance > overhead (and there a lot of rather challenging autofs changes to > do for this too) and I believe the fsinfo() system call, and > related bits, is the way to do this. > > But improving the handling of large mount tables for autofs > will have the side effect of improvements for other mount table > users, even in the early stages of this work. > > For example I want to use this for mount table handling improvements > in libmount. Clearly that ultimately needs mount change notification > in the end but ... > > There's a bunch of things that need to be done alone the way > to even get started. > > One thing that's needed is the ability to call fsinfo() to get > information on a mount to avoid constant reading of the proc based > mount table, which happens a lot (since the mount info. needs > to be up to date) so systemd (and others) would see an improvement > with the fsinfo() system call alone able to be used in libmount. > > But for the fsinfo() system call to be used for this the file > system specific mount options need to also be obtained when > using fsinfo(). That means the super block operation fsinfo uses > to provide this must be implemented for at least most file systems. > > So separating out the notifications part, leaving whatever is needed > to still be able to do this, should be fine and the system call > would be immediately useful once the super operation is implemented > for the needed file systems. > > Whether the implementation of the super operation should be done > as part of this series is another question but would certainly > be a challenge and make the series more complicated. But is needed > for the change to be useful in my case. I think what would might work - and what David had already brought up briefly - is to either base the fsinfo branch on top of the mount notificaiton branch or break the notification counters pieces into a separate patch and base both mount notifications and fsinfo on top of it. Christian
On Thu, 2020-02-20 at 10:09 +0100, Christian Brauner wrote: > On Thu, Feb 20, 2020 at 12:42:15PM +0800, Ian Kent wrote: > > On Wed, 2020-02-19 at 15:46 +0100, Christian Brauner wrote: > > > On Tue, Feb 18, 2020 at 05:04:55PM +0000, David Howells wrote: > > > > Here are a set of patches that adds system calls, that (a) > > > > allow > > > > information about the VFS, mount topology, superblock and files > > > > to > > > > be > > > > retrieved and (b) allow for notifications of mount topology > > > > rearrangement > > > > events, mount and superblock attribute changes and other > > > > superblock > > > > events, > > > > such as errors. > > > > > > > > ============================ > > > > FILESYSTEM INFORMATION QUERY > > > > ============================ > > > > > > > > The first system call, fsinfo(), allows information about the > > > > filesystem at > > > > a particular path point to be queried as a set of attributes, > > > > some > > > > of which > > > > may have more than one value. > > > > > > > > Attribute values are of four basic types: > > > > > > > > (1) Version dependent-length structure (size defined by type). > > > > > > > > (2) Variable-length string (up to 4096, including NUL). > > > > > > > > (3) List of structures (up to INT_MAX size). > > > > > > > > (4) Opaque blob (up to INT_MAX size). > > > > > > I mainly have an organizational question. :) This is a huge > > > patchset > > > with lots and lots of (good) features. Wouldn't it make sense to > > > make > > > the fsinfo() syscall a completely separate patchset from the > > > watch_mount() and watch_sb() syscalls? It seems that they don't > > > need > > > to > > > depend on each other at all. This would make reviewing this so > > > much > > > nicer and likely would mean that fsinfo() could proceed a little > > > faster. > > > > The remainder of the fsinfo() series would need to remain useful > > if this was done. > > > > For context I want work on improving handling of large mount > > tables. > > Yeah, I've talked to David about this; polling on a large mountinfo > file > is not great, I agree. > > > Ultimately I expect to solve a very long standing autofs problem > > of using large direct mount maps without prohibitive performance > > overhead (and there a lot of rather challenging autofs changes to > > do for this too) and I believe the fsinfo() system call, and > > related bits, is the way to do this. > > > > But improving the handling of large mount tables for autofs > > will have the side effect of improvements for other mount table > > users, even in the early stages of this work. > > > > For example I want to use this for mount table handling > > improvements > > in libmount. Clearly that ultimately needs mount change > > notification > > in the end but ... > > > > There's a bunch of things that need to be done alone the way > > to even get started. > > > > One thing that's needed is the ability to call fsinfo() to get > > information on a mount to avoid constant reading of the proc based > > mount table, which happens a lot (since the mount info. needs > > to be up to date) so systemd (and others) would see an improvement > > with the fsinfo() system call alone able to be used in libmount. > > > > But for the fsinfo() system call to be used for this the file > > system specific mount options need to also be obtained when > > using fsinfo(). That means the super block operation fsinfo uses > > to provide this must be implemented for at least most file systems. > > > > So separating out the notifications part, leaving whatever is > > needed > > to still be able to do this, should be fine and the system call > > would be immediately useful once the super operation is implemented > > for the needed file systems. > > > > Whether the implementation of the super operation should be done > > as part of this series is another question but would certainly > > be a challenge and make the series more complicated. But is needed > > for the change to be useful in my case. > > I think what would might work - and what David had already brought up > briefly - is to either base the fsinfo branch on top of the mount > notificaiton branch or break the notification counters pieces into a > separate patch and base both mount notifications and fsinfo on top of > it. Possibly, but I'm pretty sure David has more fsinfo patches. So I suspect there will be a right time to post patches for the fsinfo super block operation that David doesn't already have. I'm going to have to find time for that ... The post was more to let David know what my first goal is and what I need for it, and to let others know there is someone wanting to use this for user space improvements and give some initial insight into my longer term goals. Ian
Stefan Metzmacher <metze@samba.org> wrote: > > fsinfo() may be called like the following, for example: > > > > struct fsinfo_params params = { > > .at_flags = AT_SYMLINK_NOFOLLOW, > > Shouldn't all new syscalls be able to provide the RESOLVE_ flags > supported in openat2? If that's the rule, then fine. I presume these are a replacement for AT_*. But the set of RESOLVE_* flags does not appear to be complete - and why's it not in linux/fs.h if it's meant to be used by everything? Anyway, it lacks a RESOLVE_NO_AUTOMOUNT flag. This is not quite the same as the documented behaviour of RESOLVE_NO_XDEV. > > len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc", ¶ms, > > &address, sizeof(address)); > > Also passing sizeof(params) would allow future updates of fsinfo_params, > also similar to openat2(), clone3()... I can put that at the beginning of the params block or put dirfd in there. If I remember rightly, 6-arg syscalls are discouraged because they may need special handling on some arches. David