diff mbox series

[v2] block: handle BLK_OPEN_RESTRICT_WRITES correctly

Message ID 20240326-lehrkraft-messwerte-e3895039e63b@brauner (mailing list archive)
State New, archived
Headers show
Series [v2] block: handle BLK_OPEN_RESTRICT_WRITES correctly | expand

Commit Message

Christian Brauner March 26, 2024, 3:46 p.m. UTC
Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
default this option is set. When it is set the long-standing behavior
of being able to write to mounted block devices is enabled.

But in order to guard against unintended corruption by writing to the
block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
off. In that case it isn't possible to write to mounted block devices
anymore.

A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
which disallows concurrent BLK_OPEN_WRITE access. When we still had the
bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
the mode was passed around. Since we managed to get rid of the bdev
handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
on whether the file was opened writable and writes to that block device
are blocked. That logic doesn't work because we do allow
BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.

Fix the detection logic and use one of the FMODE_* bits we freed up a
while ago. We could've also abused O_EXCL as an indicator that
BLK_OPEN_RESTRICT_WRITES has been requested. For userspace open paths
O_EXCL will never be retained but for internal opens where we open files
that are never installed into a file descriptor table this is fine. But
it would be a gamble that this doesn't cause bugs. Note that
BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot directly
be raised by userspace. It is implicitly raised during mounting.

Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
unset.

Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org
Link: https://lore.kernel.org/r/20240323-zielbereich-mittragen-6fdf14876c3e@brauner
Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 block/bdev.c       | 14 +++++++-------
 include/linux/fs.h |  2 ++
 2 files changed, 9 insertions(+), 7 deletions(-)

Comments

Christoph Hellwig March 26, 2024, 5:25 p.m. UTC | #1
Not a fan of the new bit, but I see no good way around it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Jan Kara March 26, 2024, 10:42 p.m. UTC | #2
On Tue 26-03-24 16:46:19, Christian Brauner wrote:
> Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
> default this option is set. When it is set the long-standing behavior
> of being able to write to mounted block devices is enabled.
> 
> But in order to guard against unintended corruption by writing to the
> block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
> off. In that case it isn't possible to write to mounted block devices
> anymore.
> 
> A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
> which disallows concurrent BLK_OPEN_WRITE access. When we still had the
> bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
> the mode was passed around. Since we managed to get rid of the bdev
> handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
> on whether the file was opened writable and writes to that block device
> are blocked. That logic doesn't work because we do allow
> BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.
> 
> Fix the detection logic and use one of the FMODE_* bits we freed up a
> while ago. We could've also abused O_EXCL as an indicator that
> BLK_OPEN_RESTRICT_WRITES has been requested. For userspace open paths
> O_EXCL will never be retained but for internal opens where we open files
> that are never installed into a file descriptor table this is fine. But
> it would be a gamble that this doesn't cause bugs. Note that
> BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot directly
> be raised by userspace. It is implicitly raised during mounting.
> 
> Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
> unset.
> 
> Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org
> Link: https://lore.kernel.org/r/20240323-zielbereich-mittragen-6fdf14876c3e@brauner
> Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
> Reported-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bdev.c       | 14 +++++++-------
>  include/linux/fs.h |  2 ++
>  2 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 070890667563..6955693e4bcd 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -814,13 +814,11 @@ static void bdev_yield_write_access(struct file *bdev_file)
>  		return;
>  
>  	bdev = file_bdev(bdev_file);
> -	/* Yield exclusive or shared write access. */
> -	if (bdev_file->f_mode & FMODE_WRITE) {
> -		if (bdev_writes_blocked(bdev))
> -			bdev_unblock_writes(bdev);
> -		else
> -			bdev->bd_writers--;
> -	}
> +
> +	if (bdev_file->f_mode & FMODE_WRITE_RESTRICTED)
> +		bdev_unblock_writes(bdev);
> +	else if (bdev_file->f_mode & FMODE_WRITE)
> +		bdev->bd_writers--;
>  }
>  
>  /**
> @@ -900,6 +898,8 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
>  	bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
>  	if (bdev_nowait(bdev))
>  		bdev_file->f_mode |= FMODE_NOWAIT;
> +	if (mode & BLK_OPEN_RESTRICT_WRITES)
> +		bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
>  	bdev_file->f_mapping = bdev->bd_inode->i_mapping;
>  	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
>  	bdev_file->private_data = holder;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 00fc429b0af0..8dfd53b52744 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -121,6 +121,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  #define FMODE_PWRITE		((__force fmode_t)0x10)
>  /* File is opened for execution with sys_execve / sys_uselib */
>  #define FMODE_EXEC		((__force fmode_t)0x20)
> +/* File writes are restricted (block device specific) */
> +#define FMODE_WRITE_RESTRICTED  ((__force fmode_t)0x40)
>  /* 32bit hashes as llseek() offset (for directories) */
>  #define FMODE_32BITHASH         ((__force fmode_t)0x200)
>  /* 64bit hashes as llseek() offset (for directories) */
> -- 
> 2.43.0
>
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index 070890667563..6955693e4bcd 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -814,13 +814,11 @@  static void bdev_yield_write_access(struct file *bdev_file)
 		return;
 
 	bdev = file_bdev(bdev_file);
-	/* Yield exclusive or shared write access. */
-	if (bdev_file->f_mode & FMODE_WRITE) {
-		if (bdev_writes_blocked(bdev))
-			bdev_unblock_writes(bdev);
-		else
-			bdev->bd_writers--;
-	}
+
+	if (bdev_file->f_mode & FMODE_WRITE_RESTRICTED)
+		bdev_unblock_writes(bdev);
+	else if (bdev_file->f_mode & FMODE_WRITE)
+		bdev->bd_writers--;
 }
 
 /**
@@ -900,6 +898,8 @@  int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 	bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
 	if (bdev_nowait(bdev))
 		bdev_file->f_mode |= FMODE_NOWAIT;
+	if (mode & BLK_OPEN_RESTRICT_WRITES)
+		bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
 	bdev_file->f_mapping = bdev->bd_inode->i_mapping;
 	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
 	bdev_file->private_data = holder;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 00fc429b0af0..8dfd53b52744 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -121,6 +121,8 @@  typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define FMODE_PWRITE		((__force fmode_t)0x10)
 /* File is opened for execution with sys_execve / sys_uselib */
 #define FMODE_EXEC		((__force fmode_t)0x20)
+/* File writes are restricted (block device specific) */
+#define FMODE_WRITE_RESTRICTED  ((__force fmode_t)0x40)
 /* 32bit hashes as llseek() offset (for directories) */
 #define FMODE_32BITHASH         ((__force fmode_t)0x200)
 /* 64bit hashes as llseek() offset (for directories) */