mbox series

[GIT,PULL] vfs fixes

Message ID 20240318-vfs-fixes-e0e7e114b1d1@brauner (mailing list archive)
State New
Headers show
Series [GIT,PULL] vfs fixes | expand

Pull-request

git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.9-rc1.fixes

Message

Christian Brauner March 18, 2024, 12:19 p.m. UTC
Hey Linus,

/* Summary */
This contains a few small fixes for this merge window:

* Undo the hiding of silly-rename files in afs. If they're hidden they
  can't be deleted by rm manually anymore causing regressions.
* Avoid caching the preferred address for an afs server to avoid
  accidently overriding an explicitly specified preferred server address.
* Fix bad stat() and rmdir() interaction in afs.
* Take a passive reference on the superblock when opening a block device
  so the holder is available to concurrent callers from the block layer.
* Clear private data pointer in fscache_begin_operation() to avoid it
  being falsely treated as valid.

/* Testing */
clang: Debian clang version 16.0.6 (19)
gcc: (Debian 13.2.0-7) 13.2.0

All patches are based on mainline and have been sitting in linux-next.
No build failures or warnings were observed.

/* Conflicts */
No known conflicts.

The following changes since commit 480e035fc4c714fb5536e64ab9db04fedc89e910:

  Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel (2024-03-13 18:34:05 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.9-rc1.fixes

for you to fetch changes up to 449ac5514631dd9b9b66dd708dd5beb1428e2812:

  fscache: Fix error handling in fscache_begin_operation() (2024-03-18 10:33:48 +0100)

Please consider pulling these changes from the signed vfs-6.9-rc1.fixes tag.

Thanks!
Christian

----------------------------------------------------------------
vfs-6.9-rc1.fixes

----------------------------------------------------------------
Christian Brauner (1):
      fs,block: get holder during claim

David Howells (4):
      afs: Revert "afs: Hide silly-rename files from userspace"
      afs: Don't cache preferred address
      afs: Fix occasional rmdir-then-VNOVNODE with generic/011
      fscache: Fix error handling in fscache_begin_operation()

 block/bdev.c           |  7 +++++++
 fs/afs/dir.c           | 10 ----------
 fs/afs/rotate.c        | 21 ++++-----------------
 fs/afs/validation.c    | 16 +++++++++-------
 fs/netfs/fscache_io.c  |  4 +++-
 fs/super.c             | 18 ++++++++++++++++++
 include/linux/blkdev.h | 10 ++++++++++
 7 files changed, 51 insertions(+), 35 deletions(-)

Comments

pr-tracker-bot@kernel.org March 18, 2024, 4:48 p.m. UTC | #1
The pull request you sent on Mon, 18 Mar 2024 13:19:54 +0100:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.9-rc1.fixes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/0a7b0acecea273c8816f4f5b0e189989470404cf

Thank you!
Linus Torvalds March 18, 2024, 7:14 p.m. UTC | #2
On Mon, 18 Mar 2024 at 05:20, Christian Brauner <brauner@kernel.org> wrote:
>
> * Take a passive reference on the superblock when opening a block device
>   so the holder is available to concurrent callers from the block layer.

So I've pulled this, but I have to admit that I hate it.

The bdev "holder" logic is an abomination. And "struct blk_holder_ops"
is horrendous.

Afaik, we have exactly two cases of "struct blk_holder_ops" in the
whole kernel, and you edited one of them.

And the other one is in bcachefs, and is a completely empty one with
no actual ops, so I think that one shouldn't exist.

In other words, we have only *one* actual set of "holder ops".  That
makes me suspicious in the first place.

Now, let's then look at that new "holder->put_holder" use. It has
_one_ single user too, which is bd_end_claim(), which is called from
one place, which is bdev_release(). Which in turn is called from
exactly one place, which is blkdev_release(). Which is the release
function for def_blk_fops. Which is called from __fput() on the last
release of the file.

Fine, fine, fine. So let's chase down *who* actually uses that single
"blk_holder_ops". And it turns out that it's used in three places:
fs/super.c, fs/ext4/super.c, and fs/xfs/xfs_super.c.

So in those three cases, it would be absolutely *wrong* if the
'holder' was anything but the super-block (because that's what the new
get/put functions require for any of this to work.

This all smells horribly bad to me. The code looks and acts like it is
some generic interface, but in reality it really isn't. Yes, bcachefs
seems to make up some random holder (it's a one-byte kmalloc that
isn't actually used), and a random holder op structure (it's empty, as
mentioned), but none of this makes any sense at all.

I get the feeling that the "get/put" operations should just be done in
the three places that currently use that 'fs_holder_ops'.

IOW, isn't the 'get()' always basically paired with the mounting? And
the 'put()' would probably be best done iin kill_block_super()?

I don't know. Maybe I missed something really important, but this
smells like a specific case that simply shouldn't have gotten this
kind of "generic infrastructure" solution.

               Linus
Linus Torvalds March 18, 2024, 7:41 p.m. UTC | #3
On Mon, 18 Mar 2024 at 12:14, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, isn't the 'get()' always basically paired with the mounting? And
> the 'put()' would probably be best done iin kill_block_super()?

.. or alternative handwavy approach:

 The fundamental _reason_ for the ->get/put seems to be to make the
'holder' lifetime be at least as long as the 'struct file' it is
associated with. No?

So how about we just make the 'holder' always *be* a 'struct file *'? That

 (a) gets rid of the typeless 'void *' part

 (b) is already what it is for normal cases (ie O_EXCL file opens).

wouldn't it be lovely if we just made the rule be that 'holder' *is*
the file pointer, and got rid of a lot of typeless WTF code?

Again, this comment (and the previous email) is more based on "this
does not feel right to me" than anything else.

That code just makes my skin itch. I can't say it's _wrong_, but it
just FeelsWrongToMe(tm).

          Linus
Christian Brauner March 19, 2024, 6:58 a.m. UTC | #4
On Mon, Mar 18, 2024 at 12:41:49PM -0700, Linus Torvalds wrote:
> On Mon, 18 Mar 2024 at 12:14, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > IOW, isn't the 'get()' always basically paired with the mounting? And
> > the 'put()' would probably be best done iin kill_block_super()?
> 
> .. or alternative handwavy approach:
> 
>  The fundamental _reason_ for the ->get/put seems to be to make the
> 'holder' lifetime be at least as long as the 'struct file' it is
> associated with. No?
> 
> So how about we just make the 'holder' always *be* a 'struct file *'? That
> 
>  (a) gets rid of the typeless 'void *' part
> 
>  (b) is already what it is for normal cases (ie O_EXCL file opens).
> 
> wouldn't it be lovely if we just made the rule be that 'holder' *is*
> the file pointer, and got rid of a lot of typeless WTF code?
> 
> Again, this comment (and the previous email) is more based on "this
> does not feel right to me" than anything else.
> 
> That code just makes my skin itch. I can't say it's _wrong_, but it
> just FeelsWrongToMe(tm).

So, initially I think the holder ops were intended to be generic by
Christoph but I agree that it's probably not needed. I just didn't
massage that code yet. Now on my todo for this cycle!
Christian Brauner March 20, 2024, 10:21 a.m. UTC | #5
> > Again, this comment (and the previous email) is more based on "this
> > does not feel right to me" than anything else.
> > 
> > That code just makes my skin itch. I can't say it's _wrong_, but it
> > just FeelsWrongToMe(tm).
> 
> So, initially I think the holder ops were intended to be generic by
> Christoph but I agree that it's probably not needed. I just didn't
> massage that code yet. Now on my todo for this cycle!

So, the block holder ops will gain additional implementers in the block
layer that will implement their own separate ops. So I trust the block
layer with this.

The holder is used to determine whether a block device can be reopened.
So both for internal (mounting, log device initialization) or userspace
opens we compare the holders of the block device. We do have allowed for
quite some time to open the same block device exclusively with different
flags. So there are multiple files open to the same block device and the
holder is used as proof that it can be reopened. So always using the
file as the holder would still mean that we have to compare
file->private_data to determine whether the block device can be
reopened. So it won't get us as much as we'd want.

The reason for the holder to remain valid is that the block layer does
have ioctl operations such as removal of a device in the case of nbd,
suspend and resume used in stuff like cryptsetup. In all such cases we
go from arbitrary block device to arbitrary holder and then inform them
about the operation calling the appropriate callback. So we would still
have to guarantee the validity of the holder in file->private_data.

There are also two internal codepaths where the block device is
temporarly marked as being in the process of being claimed. This will
cause actual openers to wait until bd_holder is really set or aborted
but not fail the actual open. This has traditionally been the case in
the loop code and during user initiated and internally triggered
partition scanning. That could be reworked but would be pretty ugly.

We'll continue considering additional cleanups and latest next merge
window I'll give you a detailed write up what happened.