mbox series

[RFC,0/3] vfs: Use an xarray instead of inserted bookmarks to scan mount list

Message ID 161581005972.2850696.12854461380574304411.stgit@warthog.procyon.org.uk (mailing list archive)
Headers show
Series vfs: Use an xarray instead of inserted bookmarks to scan mount list | expand

Message

David Howells March 15, 2021, 12:07 p.m. UTC
Hi Al, Miklós,

Can we consider replacing the "insert cursor" approach we're currently
using for proc files to scan the current namespace's mount list[1] with
something that uses an xarray of mounts indexed by mnt_id?

This has some advantages:

 (1) It's simpler.  We don't need to insert dummy mount objects as
     bookmarks into the mount list and code that's walking the list doesn't
     have to carefully step over them.

 (2) We can use the file position to represent the mnt_id and can jump to
     it directly - ie. using seek() to jump to a mount object by its ID.

 (3) It might make it easier to use RCU in future to dump mount entries
     rather than having to take namespace_sem.  xarray provides for the
     possibility of tagging entries to say that they're viewable to avoid
     dumping incomplete mount objects.

But there are a number of disadvantages:

 (1) We have to allocate memory to maintain the xarray, which becomes more
     of a problem as mnt_id values get scattered.

 (2) We need to preallocate the xarray memory because we're manipulating

 (3) The effect could be magnified because someone mounts an entire
     subtree and propagation has to copy all of it.

David

Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f7fa85607bc398f843ff0f [1]

---
David Howells (3):
      vfs: Use an xarray in the mount namespace to handle /proc/mounts list
      vfs: Use the mounts_to_id array to do /proc/mounts and co.
      vfs: Remove mount list trawling cursor stuff


 fs/mount.h            |  2 +-
 fs/namespace.c        | 66 ++++++++++---------------------------------
 fs/proc_namespace.c   |  3 --
 include/linux/mount.h |  4 +--
 4 files changed, 17 insertions(+), 58 deletions(-)

Comments

Matthew Wilcox March 15, 2021, 12:46 p.m. UTC | #1
On Mon, Mar 15, 2021 at 12:07:39PM +0000, David Howells wrote:
> 
> Hi Al, Miklós,
> 
> Can we consider replacing the "insert cursor" approach we're currently
> using for proc files to scan the current namespace's mount list[1] with
> something that uses an xarray of mounts indexed by mnt_id?
> 
> This has some advantages:
> 
>  (1) It's simpler.  We don't need to insert dummy mount objects as
>      bookmarks into the mount list and code that's walking the list doesn't
>      have to carefully step over them.
> 
>  (2) We can use the file position to represent the mnt_id and can jump to
>      it directly - ie. using seek() to jump to a mount object by its ID.
> 
>  (3) It might make it easier to use RCU in future to dump mount entries
>      rather than having to take namespace_sem.  xarray provides for the
>      possibility of tagging entries to say that they're viewable to avoid
>      dumping incomplete mount objects.

Usually one fully constructs the object, then inserts it into the XArray.

> But there are a number of disadvantages:
> 
>  (1) We have to allocate memory to maintain the xarray, which becomes more
>      of a problem as mnt_id values get scattered.

mnt_id values don't seem to get particularly scattered.  They're allocated
using an IDA, so they stay small (unlike someone using idr_alloc_cyclic
;-).
Miklos Szeredi March 15, 2021, 1:14 p.m. UTC | #2
On Mon, Mar 15, 2021 at 1:07 PM David Howells <dhowells@redhat.com> wrote:
>
>
> Hi Al, Miklós,
>
> Can we consider replacing the "insert cursor" approach we're currently
> using for proc files to scan the current namespace's mount list[1] with
> something that uses an xarray of mounts indexed by mnt_id?
>
> This has some advantages:
>
>  (1) It's simpler.  We don't need to insert dummy mount objects as
>      bookmarks into the mount list and code that's walking the list doesn't
>      have to carefully step over them.
>
>  (2) We can use the file position to represent the mnt_id and can jump to
>      it directly - ie. using seek() to jump to a mount object by its ID.

What happens if the mount at the current position is removed?

Thanks,
Miklos
Matthew Wilcox March 15, 2021, 1:17 p.m. UTC | #3
On Mon, Mar 15, 2021 at 02:14:35PM +0100, Miklos Szeredi wrote:
> On Mon, Mar 15, 2021 at 1:07 PM David Howells <dhowells@redhat.com> wrote:
> >
> >
> > Hi Al, Miklós,
> >
> > Can we consider replacing the "insert cursor" approach we're currently
> > using for proc files to scan the current namespace's mount list[1] with
> > something that uses an xarray of mounts indexed by mnt_id?
> >
> > This has some advantages:
> >
> >  (1) It's simpler.  We don't need to insert dummy mount objects as
> >      bookmarks into the mount list and code that's walking the list doesn't
> >      have to carefully step over them.
> >
> >  (2) We can use the file position to represent the mnt_id and can jump to
> >      it directly - ie. using seek() to jump to a mount object by its ID.
> 
> What happens if the mount at the current position is removed?

xa_find() will move to the next one.
David Howells March 15, 2021, 1:41 p.m. UTC | #4
Miklos Szeredi <miklos@szeredi.hu> wrote:

> >  (2) We can use the file position to represent the mnt_id and can jump to
> >      it directly - ie. using seek() to jump to a mount object by its ID.
> 
> What happens if the mount at the current position is removed?

umount_tree() requires the namespace_sem to be writelocked, so that should be
fine as the patches currently read-lock that whilst doing /proc/*/mount*

I'm assuming that kern_unmount() won't be a problem as it is there to deal
with mounts made by kern_mount() which don't get added to the mount list
(mnt_ns is MNT_NS_INTERNAL).  kern_unmount_array() seems to be the same
because overlayfs gives it mounts generated by clone_private_mount().  It
might be worth putting a WARN_ON() in kern_unmount() to require this.

When reading through proc, m_start() calls xas_find() which returns the entry
at the starting index or, if not present, the next higher entry.

David
Miklos Szeredi March 15, 2021, 2:22 p.m. UTC | #5
On Mon, Mar 15, 2021 at 2:41 PM David Howells <dhowells@redhat.com> wrote:
>
> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> > >  (2) We can use the file position to represent the mnt_id and can jump to
> > >      it directly - ie. using seek() to jump to a mount object by its ID.
> >
> > What happens if the mount at the current position is removed?
>
> umount_tree() requires the namespace_sem to be writelocked, so that should be
> fine as the patches currently read-lock that whilst doing /proc/*/mount*
>
> I'm assuming that kern_unmount() won't be a problem as it is there to deal
> with mounts made by kern_mount() which don't get added to the mount list
> (mnt_ns is MNT_NS_INTERNAL).  kern_unmount_array() seems to be the same
> because overlayfs gives it mounts generated by clone_private_mount().  It
> might be worth putting a WARN_ON() in kern_unmount() to require this.
>
> When reading through proc, m_start() calls xas_find() which returns the entry
> at the starting index or, if not present, the next higher entry.

This will break the property of new mounts always being added to the
end of the list.  That's likely a regression for nerural based parsers
(i.e. people), probably less so for machine parsers.

Thanks,
Miklos

>
> David
>