mbox series

[00/18] new API for FS_IOC_[GS]ETFLAGS/FS_IOC_FS[GS]ETXATTR

Message ID 20210203124112.1182614-1-mszeredi@redhat.com (mailing list archive)
Headers show
Series new API for FS_IOC_[GS]ETFLAGS/FS_IOC_FS[GS]ETXATTR | expand

Message

Miklos Szeredi Feb. 3, 2021, 12:40 p.m. UTC
This series adds the infrastructure and conversion of filesystems to the
new API.

Two filesystems are not converted: FUSE and CIFS, as they behave
differently from local filesystems (use the file pointer, don't perform
permission checks).  It's likely that these two can be supported with minor
changes to the API, but this requires more thought.

Quick xfstests on ext4, xfs and overlayfs didn't show any regressions.
Other filesystems were only compile tested.

Git tree is available here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git#miscattr

---
Miklos Szeredi (18):
  vfs: add miscattr ops
  ecryptfs: stack miscattr ops
  ovl: stack miscattr
  btrfs: convert to miscattr
  ext2: convert to miscattr
  ext4: convert to miscattr
  f2fs: convert to miscattr
  gfs2: convert to miscattr
  orangefs: convert to miscattr
  xfs: convert to miscattr
  efivars: convert to miscattr
  hfsplus: convert to miscattr
  jfs: convert to miscattr
  nilfs2: convert to miscattr
  ocfs2: convert to miscattr
  reiserfs: convert to miscattr
  ubifs: convert to miscattr
  vfs: remove unused ioctl helpers

 Documentation/filesystems/locking.rst |   4 +
 Documentation/filesystems/vfs.rst     |  14 ++
 fs/btrfs/ctree.h                      |   2 +
 fs/btrfs/inode.c                      |   4 +
 fs/btrfs/ioctl.c                      | 248 ++++---------------
 fs/ecryptfs/inode.c                   |  21 ++
 fs/efivarfs/file.c                    |  77 ------
 fs/efivarfs/inode.c                   |  43 ++++
 fs/ext2/ext2.h                        |   6 +-
 fs/ext2/file.c                        |   2 +
 fs/ext2/ioctl.c                       |  85 +++----
 fs/ext2/namei.c                       |   2 +
 fs/ext4/ext4.h                        |  11 +-
 fs/ext4/file.c                        |   2 +
 fs/ext4/ioctl.c                       | 209 ++++------------
 fs/ext4/namei.c                       |   2 +
 fs/f2fs/f2fs.h                        |   2 +
 fs/f2fs/file.c                        | 212 +++--------------
 fs/f2fs/namei.c                       |   2 +
 fs/gfs2/file.c                        |  56 +----
 fs/gfs2/inode.c                       |   4 +
 fs/gfs2/inode.h                       |   2 +
 fs/hfsplus/dir.c                      |   2 +
 fs/hfsplus/hfsplus_fs.h               |  13 +-
 fs/hfsplus/inode.c                    |  53 +++++
 fs/hfsplus/ioctl.c                    |  84 -------
 fs/inode.c                            |  87 -------
 fs/ioctl.c                            | 329 ++++++++++++++++++++++++++
 fs/jfs/file.c                         |   6 +-
 fs/jfs/ioctl.c                        | 104 +++-----
 fs/jfs/jfs_dinode.h                   |   7 -
 fs/jfs/jfs_inode.h                    |   3 +-
 fs/jfs/namei.c                        |   6 +-
 fs/nilfs2/file.c                      |   2 +
 fs/nilfs2/ioctl.c                     |  60 ++---
 fs/nilfs2/namei.c                     |   2 +
 fs/nilfs2/nilfs.h                     |   2 +
 fs/ocfs2/file.c                       |   2 +
 fs/ocfs2/ioctl.c                      |  58 ++---
 fs/ocfs2/ioctl.h                      |   2 +
 fs/ocfs2/namei.c                      |   3 +
 fs/ocfs2/ocfs2_ioctl.h                |   8 -
 fs/orangefs/file.c                    |  79 -------
 fs/orangefs/inode.c                   |  49 ++++
 fs/overlayfs/dir.c                    |   2 +
 fs/overlayfs/inode.c                  |  43 ++++
 fs/overlayfs/overlayfs.h              |   2 +
 fs/reiserfs/file.c                    |   2 +
 fs/reiserfs/ioctl.c                   | 120 +++++-----
 fs/reiserfs/namei.c                   |   2 +
 fs/reiserfs/reiserfs.h                |   6 +-
 fs/reiserfs/super.c                   |   2 +-
 fs/ubifs/dir.c                        |   2 +
 fs/ubifs/file.c                       |   2 +
 fs/ubifs/ioctl.c                      |  73 +++---
 fs/ubifs/ubifs.h                      |   2 +
 fs/xfs/libxfs/xfs_fs.h                |   4 -
 fs/xfs/xfs_ioctl.c                    | 294 +++++++----------------
 fs/xfs/xfs_ioctl.h                    |  10 +
 fs/xfs/xfs_ioctl32.c                  |   2 -
 fs/xfs/xfs_ioctl32.h                  |   2 -
 fs/xfs/xfs_iops.c                     |   7 +
 include/linux/fs.h                    |  15 +-
 include/linux/miscattr.h              |  52 ++++
 64 files changed, 1097 insertions(+), 1518 deletions(-)
 create mode 100644 include/linux/miscattr.h

Comments

Matthew Wilcox Feb. 3, 2021, 1:05 p.m. UTC | #1
On Wed, Feb 03, 2021 at 01:40:54PM +0100, Miklos Szeredi wrote:
> This series adds the infrastructure and conversion of filesystems to the
> new API.
> 
> Two filesystems are not converted: FUSE and CIFS, as they behave
> differently from local filesystems (use the file pointer, don't perform
> permission checks).  It's likely that these two can be supported with minor
> changes to the API, but this requires more thought.

Why not change the API now?  ie pass the file instead of the dentry?
Miklos Szeredi Feb. 3, 2021, 1:13 p.m. UTC | #2
On Wed, Feb 3, 2021 at 2:08 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 03, 2021 at 01:40:54PM +0100, Miklos Szeredi wrote:
> > This series adds the infrastructure and conversion of filesystems to the
> > new API.
> >
> > Two filesystems are not converted: FUSE and CIFS, as they behave
> > differently from local filesystems (use the file pointer, don't perform
> > permission checks).  It's likely that these two can be supported with minor
> > changes to the API, but this requires more thought.
>
> Why not change the API now?  ie pass the file instead of the dentry?

These are inode attributes we are talking about, not much sense in
passing an open file to the filesystem.  That was/is due to ioctl
being an fd based API.

It would make more sense to convert these filesystems to use a dentry
instead of a file pointer.  Which is not trivial, unfortuantely.

Thanks,
Miklos
Matthew Wilcox Feb. 3, 2021, 1:58 p.m. UTC | #3
On Wed, Feb 03, 2021 at 02:13:27PM +0100, Miklos Szeredi wrote:
> On Wed, Feb 3, 2021 at 2:08 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Feb 03, 2021 at 01:40:54PM +0100, Miklos Szeredi wrote:
> > > This series adds the infrastructure and conversion of filesystems to the
> > > new API.
> > >
> > > Two filesystems are not converted: FUSE and CIFS, as they behave
> > > differently from local filesystems (use the file pointer, don't perform
> > > permission checks).  It's likely that these two can be supported with minor
> > > changes to the API, but this requires more thought.
> >
> > Why not change the API now?  ie pass the file instead of the dentry?
> 
> These are inode attributes we are talking about, not much sense in
> passing an open file to the filesystem.  That was/is due to ioctl
> being an fd based API.

You might as well say "Not much point passing a dentry to the filesystem"
and just pass the inode.

> It would make more sense to convert these filesystems to use a dentry
> instead of a file pointer.  Which is not trivial, unfortuantely.

Network filesystems frequently need to use the credentials attached to
a struct file in order to communicate with the server.  There's no point
fighting this reality.
Miklos Szeredi Feb. 3, 2021, 2:13 p.m. UTC | #4
On Wed, Feb 3, 2021 at 2:58 PM Matthew Wilcox <willy@infradead.org> wrote:

> Network filesystems frequently need to use the credentials attached to
> a struct file in order to communicate with the server.  There's no point
> fighting this reality.

IDGI.  Credentials can be taken from the file and from the task.  In
this case every filesystem except cifs looks at task creds. Why are
network filesystem special in this respect?

Thanks,
Miklos
Matthew Wilcox Feb. 3, 2021, 2:28 p.m. UTC | #5
On Wed, Feb 03, 2021 at 03:13:29PM +0100, Miklos Szeredi wrote:
> On Wed, Feb 3, 2021 at 2:58 PM Matthew Wilcox <willy@infradead.org> wrote:
> 
> > Network filesystems frequently need to use the credentials attached to
> > a struct file in order to communicate with the server.  There's no point
> > fighting this reality.
> 
> IDGI.  Credentials can be taken from the file and from the task.  In
> this case every filesystem except cifs looks at task creds. Why are
> network filesystem special in this respect?

I don't necessarily mean 'struct cred'.  I mean "the authentication
that the client has performed to the server".  Which is not a per-task
thing, it's stored in the struct file, which is why we have things like

        int (*write_begin)(struct file *, struct address_space *mapping,
                                loff_t pos, unsigned len, unsigned flags,
                                struct page **pagep, void **fsdata);

disk filesystems ignore the struct file argument, but network filesystems
very much use it.
Miklos Szeredi Feb. 3, 2021, 2:38 p.m. UTC | #6
On Wed, Feb 3, 2021 at 3:28 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Feb 03, 2021 at 03:13:29PM +0100, Miklos Szeredi wrote:
> > On Wed, Feb 3, 2021 at 2:58 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > > Network filesystems frequently need to use the credentials attached to
> > > a struct file in order to communicate with the server.  There's no point
> > > fighting this reality.
> >
> > IDGI.  Credentials can be taken from the file and from the task.  In
> > this case every filesystem except cifs looks at task creds. Why are
> > network filesystem special in this respect?
>
> I don't necessarily mean 'struct cred'.  I mean "the authentication
> that the client has performed to the server".  Which is not a per-task
> thing, it's stored in the struct file, which is why we have things like
>
>         int (*write_begin)(struct file *, struct address_space *mapping,
>                                 loff_t pos, unsigned len, unsigned flags,
>                                 struct page **pagep, void **fsdata);
>
> disk filesystems ignore the struct file argument, but network filesystems
> very much use it.

Fine for file I/O.  That's authorized at open time for all
filesystems, not just network ones.

Not fine for metadata operations (IMO).   I.e. ->[gs]etattr() don't
take a file argument either, even though on the uAPI there are plenty
of open file based variants.

Thanks,
Miklos
Matthew Wilcox Feb. 3, 2021, 2:56 p.m. UTC | #7
On Wed, Feb 03, 2021 at 03:38:54PM +0100, Miklos Szeredi wrote:
> On Wed, Feb 3, 2021 at 3:28 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Feb 03, 2021 at 03:13:29PM +0100, Miklos Szeredi wrote:
> > > On Wed, Feb 3, 2021 at 2:58 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > > Network filesystems frequently need to use the credentials attached to
> > > > a struct file in order to communicate with the server.  There's no point
> > > > fighting this reality.
> > >
> > > IDGI.  Credentials can be taken from the file and from the task.  In
> > > this case every filesystem except cifs looks at task creds. Why are
> > > network filesystem special in this respect?
> >
> > I don't necessarily mean 'struct cred'.  I mean "the authentication
> > that the client has performed to the server".  Which is not a per-task
> > thing, it's stored in the struct file, which is why we have things like
> >
> >         int (*write_begin)(struct file *, struct address_space *mapping,
> >                                 loff_t pos, unsigned len, unsigned flags,
> >                                 struct page **pagep, void **fsdata);
> >
> > disk filesystems ignore the struct file argument, but network filesystems
> > very much use it.
> 
> Fine for file I/O.  That's authorized at open time for all
> filesystems, not just network ones.
> 
> Not fine for metadata operations (IMO).   I.e. ->[gs]etattr() don't
> take a file argument either, even though on the uAPI there are plenty
> of open file based variants.

That's a fine statement of principle, but if the filesystem needs to
contact the server, then your principle must accede to reality.

But let's talk specifics.  What does CIFS need to contact the server for?
Could it be cached earlier?
Miklos Szeredi Feb. 3, 2021, 3:03 p.m. UTC | #8
On Wed, Feb 3, 2021 at 3:56 PM Matthew Wilcox <willy@infradead.org> wrote:

> But let's talk specifics.  What does CIFS need to contact the server for?
> Could it be cached earlier?

I don't understand what CIFS is doing, and I don't really care.   This
is the sort of operation where adding a couple of network roundtrips
so that the client can obtain the credentials required to perform the
operation doesn't really matter.  We won't have thousands of chattr(1)
calls per second.

So I think the principle is more important than the details of the
current implementation.

And I'm saying that knowing that fixing up FUSE will be my
responsibility and it won't be trivial either.

Thanks,
Miklos
Dave Chinner Feb. 8, 2021, 2 a.m. UTC | #9
On Wed, Feb 03, 2021 at 04:03:06PM +0100, Miklos Szeredi wrote:
> On Wed, Feb 3, 2021 at 3:56 PM Matthew Wilcox <willy@infradead.org> wrote:
> 
> > But let's talk specifics.  What does CIFS need to contact the server for?
> > Could it be cached earlier?
> 
> I don't understand what CIFS is doing, and I don't really care.   This
> is the sort of operation where adding a couple of network roundtrips
> so that the client can obtain the credentials required to perform the
> operation doesn't really matter.  We won't have thousands of chattr(1)
> calls per second.

Incorrect.

The xfs utilities can do recursive directory traversal to change
things like the project ID across an entire directory tree. Or to
change extent size hints.

We also have 'xfs_io -c "lsattr -R" ...' and 'lsprog -R' which will
do a recursive descent to list the requested attributes of all
directories and files in the tree...

So, yeah, we do indeed do thousands of these fsxattr based
operations a second, sometimes tens of thousands a second or more,
and sometimes they are issued in bulk in performance critical paths
for container build/deployment operations....

Cheers,

Dave.
Miklos Szeredi Feb. 8, 2021, 8:25 a.m. UTC | #10
On Mon, Feb 8, 2021 at 3:00 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Feb 03, 2021 at 04:03:06PM +0100, Miklos Szeredi wrote:
> > On Wed, Feb 3, 2021 at 3:56 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > > But let's talk specifics.  What does CIFS need to contact the server for?
> > > Could it be cached earlier?
> >
> > I don't understand what CIFS is doing, and I don't really care.   This
> > is the sort of operation where adding a couple of network roundtrips
> > so that the client can obtain the credentials required to perform the
> > operation doesn't really matter.  We won't have thousands of chattr(1)
> > calls per second.
>
> Incorrect.

Okay, I was wrong.

Still, CIFS may very well be able to perform these operations without
a struct file.   But even if it can't, I'd still only add the file
pointer as an *optional hint* from the VFS, not as the primary object
as Matthew suggested.

I stand by my choice of /struct dentry/ as the object to pass to these
operations.

Thanks,
Miklos
Matthew Wilcox Feb. 8, 2021, 2:02 p.m. UTC | #11
On Mon, Feb 08, 2021 at 09:25:22AM +0100, Miklos Szeredi wrote:
> On Mon, Feb 8, 2021 at 3:00 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Feb 03, 2021 at 04:03:06PM +0100, Miklos Szeredi wrote:
> > > On Wed, Feb 3, 2021 at 3:56 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > > But let's talk specifics.  What does CIFS need to contact the server for?
> > > > Could it be cached earlier?
> > >
> > > I don't understand what CIFS is doing, and I don't really care.   This
> > > is the sort of operation where adding a couple of network roundtrips
> > > so that the client can obtain the credentials required to perform the
> > > operation doesn't really matter.  We won't have thousands of chattr(1)
> > > calls per second.
> >
> > Incorrect.
> 
> Okay, I was wrong.
> 
> Still, CIFS may very well be able to perform these operations without
> a struct file.   But even if it can't, I'd still only add the file
> pointer as an *optional hint* from the VFS, not as the primary object
> as Matthew suggested.
> 
> I stand by my choice of /struct dentry/ as the object to pass to these
> operations.

Why the dentry?  This is an inode operation.  Why doesn't it take an
inode as its primary object?
Miklos Szeredi Feb. 8, 2021, 2:52 p.m. UTC | #12
On Mon, Feb 8, 2021 at 3:02 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Feb 08, 2021 at 09:25:22AM +0100, Miklos Szeredi wrote:
> > On Mon, Feb 8, 2021 at 3:00 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Wed, Feb 03, 2021 at 04:03:06PM +0100, Miklos Szeredi wrote:
> > > > On Wed, Feb 3, 2021 at 3:56 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > > But let's talk specifics.  What does CIFS need to contact the server for?
> > > > > Could it be cached earlier?
> > > >
> > > > I don't understand what CIFS is doing, and I don't really care.   This
> > > > is the sort of operation where adding a couple of network roundtrips
> > > > so that the client can obtain the credentials required to perform the
> > > > operation doesn't really matter.  We won't have thousands of chattr(1)
> > > > calls per second.
> > >
> > > Incorrect.
> >
> > Okay, I was wrong.
> >
> > Still, CIFS may very well be able to perform these operations without
> > a struct file.   But even if it can't, I'd still only add the file
> > pointer as an *optional hint* from the VFS, not as the primary object
> > as Matthew suggested.
> >
> > I stand by my choice of /struct dentry/ as the object to pass to these
> > operations.
>
> Why the dentry?  This is an inode operation.  Why doesn't it take an
> inode as its primary object?

If we pass struct file to the op, then that limits callers to those
that have an open file.  E.g. it would be difficult to introduce a
path based userspace API for getting/changing these attributes.

Passing a dentry instead of an inode has no such problem, since the
dentry is always available, whether coming from an open file or a
path.  Some inode operations do pass a dentry instead of an inode
(readlink, getattr, setattr) and some filesystems take advantage of
this.

Thanks,
Miklos