diff mbox series

block: handle BLK_OPEN_RESTRICT_WRITES correctly

Message ID 20240323-seide-erbrachten-5c60873fadc1@brauner (mailing list archive)
State New, archived
Headers show
Series block: handle BLK_OPEN_RESTRICT_WRITES correctly | expand

Commit Message

Christian Brauner March 23, 2024, 2:54 p.m. UTC
Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
default this option is set. When it is set the long-standing behavior
of being able to write to mounted block devices is enabled.

But in order to guard against unintended corruption by writing to the
block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
off. In that case it isn't possible to write to mounted block devices
anymore.

A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
which disallows concurrent BLK_OPEN_WRITE access. When we still had the
bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
the mode was passed around. Since we managed to get rid of the bdev
handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
on whether the file was opened writable and writes to that block device
are blocked. That logic doesn't work because we do allow
BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.

So fix the detection logic. When we open a block device with
BLK_OPEN_RESTRICT_WRITES we know that a holder must've been specified as
we forbid BLK_OPEN_RESTRICT_WRITES without a holder in
bdev_permission(). The holder will be stashed in
bdev_file->private_data. So we check whether bdev_file->private_data is
set and whether writes are blocked. If so, we unblock writes. Otherwise,
if the bdev_file was opened writable we just decrement the write count.

Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot
directly be raised by userspace. It is implicitly raised during
mounting. So any concurrent writable request from userspace will fail
and so recognition based on bdev_file->private_data is safe.

Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
unset.

Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
Reported-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
@Jan, I have one question for you. Currently your original changes in
v6.8 do allow for a block device to be reopened with
BLK_OPEN_RESTRICT_WRITES provided the same holder is used as per
bdev_may_open(). I think that may have a bug.

The first opener @f1 of that block device will set bdev->bd_writers to
-1. The second opener @f2 using the same holder will pass the check in
bdev_may_open() that bdev->bd_writers must not be greater than zero.

The first opener @f1 now closes the block device and in bdev_release()
will end up calling bdev_yield_write_access() which calls
bdev_writes_blocked() and sets bdev->bd_writers to 0 again.

Now @f2 holds a file to that block device which was opened with
exclusive write access but bdev->bd_writers has been reset to 0.

So now @f3 comes along and succeeds in opening the block device with
BLK_OPEN_WRITE betraying @f2's request to have exclusive write access.

This isn't a practical issue yet because afaict there's no codepath
inside the kernel that reopenes the same block device with
BLK_OPEN_RESTRICT_WRITES but it will be if there is.

If that's right then either this needs to be fixed by counting the
number of BLK_OPEN_RESTRICT_WRITES a la [1] or we simply enforce that
BLK_OPEN_RESTRICT_WRITES means that there's only one opener a la [2].

[1]: count BLK_OPEN_RESTRICT_WRITES openers

     diff --git a/block/bdev.c b/block/bdev.c
     index 4b28a39fafc4..ea3f219310f5 100644
     --- a/block/bdev.c
     +++ b/block/bdev.c
     @@ -776,17 +776,17 @@ void blkdev_put_no_open(struct block_device *bdev)

      static bool bdev_writes_blocked(struct block_device *bdev)
      {
     -       return bdev->bd_writers == -1;
     +       return bdev->bd_writers < 0;
      }

      static void bdev_block_writes(struct block_device *bdev)
      {
     -       bdev->bd_writers = -1;
     +       bdev->bd_writers--;
      }

      static void bdev_unblock_writes(struct block_device *bdev)
      {
     -       bdev->bd_writers = 0;
     +       bdev->bd_writers++;
      }

 static bool bdev_may_open(struct block_device *bdev, blk_mode_t mode)

[2]: enforce that BLK_OPEN_RESTRICT_WRITES is unique

     diff --git a/block/bdev.c b/block/bdev.c
     index e9f1b12bd75c..cefb94a75530 100644
     --- a/block/bdev.c
     +++ b/block/bdev.c
     @@ -758,7 +758,7 @@ static bool bdev_may_open(struct block_device *bdev, blk_mode_t mode)
             /* Writes blocked? */
             if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
                     return false;
     -       if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
     +       if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers != 0)
                     return false;
             return true;
      }

But maybe I'm just not reading this right. Let me know.

Christian
---
 block/bdev.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

Comments

Christian Brauner March 23, 2024, 3:59 p.m. UTC | #1
> +	/*
> +	 * If this was an exclusive open and writes are blocked
> +	 * we know that we're the ones who blocked them.
> +	 */
> +	if (bdev_file->private_data && bdev_writes_blocked(bdev))

This doesn't work because this will unblock BLK_OPEN_RESTRICT_WRITES
when the block device has been opened read-only but exclusively, i.e.,
when e.g., userspace requested read-only access and we use the file as
the holder. I have an alternative fix.
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index 7a5f611c3d2e..4b28a39fafc4 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -821,13 +821,14 @@  static void bdev_yield_write_access(struct file *bdev_file)
 		return;
 
 	bdev = file_bdev(bdev_file);
-	/* Yield exclusive or shared write access. */
-	if (bdev_file->f_mode & FMODE_WRITE) {
-		if (bdev_writes_blocked(bdev))
-			bdev_unblock_writes(bdev);
-		else
-			bdev->bd_writers--;
-	}
+	/*
+	 * If this was an exclusive open and writes are blocked
+	 * we know that we're the ones who blocked them.
+	 */
+	if (bdev_file->private_data && bdev_writes_blocked(bdev))
+		bdev_unblock_writes(bdev);
+	else if (bdev_file->f_mode & FMODE_WRITE)
+		bdev->bd_writers--;
 }
 
 /**