Message ID | 20240326-lehrkraft-messwerte-e3895039e63b@brauner (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] block: handle BLK_OPEN_RESTRICT_WRITES correctly | expand |
Not a fan of the new bit, but I see no good way around it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Tue 26-03-24 16:46:19, Christian Brauner wrote: > Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By > default this option is set. When it is set the long-standing behavior > of being able to write to mounted block devices is enabled. > > But in order to guard against unintended corruption by writing to the > block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned > off. In that case it isn't possible to write to mounted block devices > anymore. > > A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES > which disallows concurrent BLK_OPEN_WRITE access. When we still had the > bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because > the mode was passed around. Since we managed to get rid of the bdev > handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based > on whether the file was opened writable and writes to that block device > are blocked. That logic doesn't work because we do allow > BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. > > Fix the detection logic and use one of the FMODE_* bits we freed up a > while ago. We could've also abused O_EXCL as an indicator that > BLK_OPEN_RESTRICT_WRITES has been requested. For userspace open paths > O_EXCL will never be retained but for internal opens where we open files > that are never installed into a file descriptor table this is fine. But > it would be a gamble that this doesn't cause bugs. Note that > BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot directly > be raised by userspace. It is implicitly raised during mounting. > > Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and > unset. > > Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org > Link: https://lore.kernel.org/r/20240323-zielbereich-mittragen-6fdf14876c3e@brauner > Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access") > Reported-by: Matthew Wilcox <willy@infradead.org> > Signed-off-by: Christian Brauner <brauner@kernel.org> Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > --- > block/bdev.c | 14 +++++++------- > include/linux/fs.h | 2 ++ > 2 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/block/bdev.c b/block/bdev.c > index 070890667563..6955693e4bcd 100644 > --- a/block/bdev.c > +++ b/block/bdev.c > @@ -814,13 +814,11 @@ static void bdev_yield_write_access(struct file *bdev_file) > return; > > bdev = file_bdev(bdev_file); > - /* Yield exclusive or shared write access. */ > - if (bdev_file->f_mode & FMODE_WRITE) { > - if (bdev_writes_blocked(bdev)) > - bdev_unblock_writes(bdev); > - else > - bdev->bd_writers--; > - } > + > + if (bdev_file->f_mode & FMODE_WRITE_RESTRICTED) > + bdev_unblock_writes(bdev); > + else if (bdev_file->f_mode & FMODE_WRITE) > + bdev->bd_writers--; > } > > /** > @@ -900,6 +898,8 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder, > bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT; > if (bdev_nowait(bdev)) > bdev_file->f_mode |= FMODE_NOWAIT; > + if (mode & BLK_OPEN_RESTRICT_WRITES) > + bdev_file->f_mode |= FMODE_WRITE_RESTRICTED; > bdev_file->f_mapping = bdev->bd_inode->i_mapping; > bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping); > bdev_file->private_data = holder; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 00fc429b0af0..8dfd53b52744 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -121,6 +121,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, > #define FMODE_PWRITE ((__force fmode_t)0x10) > /* File is opened for execution with sys_execve / sys_uselib */ > #define FMODE_EXEC ((__force fmode_t)0x20) > +/* File writes are restricted (block device specific) */ > +#define FMODE_WRITE_RESTRICTED ((__force fmode_t)0x40) > /* 32bit hashes as llseek() offset (for directories) */ > #define FMODE_32BITHASH ((__force fmode_t)0x200) > /* 64bit hashes as llseek() offset (for directories) */ > -- > 2.43.0 >
diff --git a/block/bdev.c b/block/bdev.c index 070890667563..6955693e4bcd 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -814,13 +814,11 @@ static void bdev_yield_write_access(struct file *bdev_file) return; bdev = file_bdev(bdev_file); - /* Yield exclusive or shared write access. */ - if (bdev_file->f_mode & FMODE_WRITE) { - if (bdev_writes_blocked(bdev)) - bdev_unblock_writes(bdev); - else - bdev->bd_writers--; - } + + if (bdev_file->f_mode & FMODE_WRITE_RESTRICTED) + bdev_unblock_writes(bdev); + else if (bdev_file->f_mode & FMODE_WRITE) + bdev->bd_writers--; } /** @@ -900,6 +898,8 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder, bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT; if (bdev_nowait(bdev)) bdev_file->f_mode |= FMODE_NOWAIT; + if (mode & BLK_OPEN_RESTRICT_WRITES) + bdev_file->f_mode |= FMODE_WRITE_RESTRICTED; bdev_file->f_mapping = bdev->bd_inode->i_mapping; bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping); bdev_file->private_data = holder; diff --git a/include/linux/fs.h b/include/linux/fs.h index 00fc429b0af0..8dfd53b52744 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -121,6 +121,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, #define FMODE_PWRITE ((__force fmode_t)0x10) /* File is opened for execution with sys_execve / sys_uselib */ #define FMODE_EXEC ((__force fmode_t)0x20) +/* File writes are restricted (block device specific) */ +#define FMODE_WRITE_RESTRICTED ((__force fmode_t)0x40) /* 32bit hashes as llseek() offset (for directories) */ #define FMODE_32BITHASH ((__force fmode_t)0x200) /* 64bit hashes as llseek() offset (for directories) */
Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By default this option is set. When it is set the long-standing behavior of being able to write to mounted block devices is enabled. But in order to guard against unintended corruption by writing to the block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned off. In that case it isn't possible to write to mounted block devices anymore. A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES which disallows concurrent BLK_OPEN_WRITE access. When we still had the bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because the mode was passed around. Since we managed to get rid of the bdev handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based on whether the file was opened writable and writes to that block device are blocked. That logic doesn't work because we do allow BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. Fix the detection logic and use one of the FMODE_* bits we freed up a while ago. We could've also abused O_EXCL as an indicator that BLK_OPEN_RESTRICT_WRITES has been requested. For userspace open paths O_EXCL will never be retained but for internal opens where we open files that are never installed into a file descriptor table this is fine. But it would be a gamble that this doesn't cause bugs. Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot directly be raised by userspace. It is implicitly raised during mounting. Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and unset. Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org Link: https://lore.kernel.org/r/20240323-zielbereich-mittragen-6fdf14876c3e@brauner Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access") Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> --- block/bdev.c | 14 +++++++------- include/linux/fs.h | 2 ++ 2 files changed, 9 insertions(+), 7 deletions(-)