diff mbox series

[v2,31/34] block: use file->f_op to indicate restricted writes

Message ID 20240123-vfs-bdev-file-v2-31-adbd023e19cc@kernel.org (mailing list archive)
State New, archived
Headers show
Series Open block devices as files | expand

Commit Message

Christian Brauner Jan. 23, 2024, 1:26 p.m. UTC
Make it possible to detected a block device that was opened with
restricted write access solely based on its file operations that it was
opened with. This avoids wasting an FMODE_* flag.

def_blk_fops isn't needed to check whether something is a block device
checking the inode type is enough for that. And def_blk_fops_restricted
can be kept private to the block layer.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 block/bdev.c | 16 ++++++++++++----
 block/blk.h  |  2 ++
 block/fops.c |  3 +++
 3 files changed, 17 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig Jan. 29, 2024, 4:49 p.m. UTC | #1
On Tue, Jan 23, 2024 at 02:26:48PM +0100, Christian Brauner wrote:
> Make it possible to detected a block device that was opened with
> restricted write access solely based on its file operations that it was
> opened with. This avoids wasting an FMODE_* flag.
> 
> def_blk_fops isn't needed to check whether something is a block device
> checking the inode type is enough for that. And def_blk_fops_restricted
> can be kept private to the block layer.

I agree with not wasting a FMODE_* flag, but I also really hate
duplicating the file operations.

I went to search for a good place to stash this information and ended up
at the f_version field in struct file.  That one is never touched by the 
VFS proper but just the file system and a few lseek helpers.  The
latter currently happen to be used by block devices unfortunately,
but it seem like moving the f_version clearing into the few filesystem
actualy using it would be good code hygiene anyway.
Christian Brauner Jan. 29, 2024, 5:09 p.m. UTC | #2
On Mon, Jan 29, 2024 at 05:49:34PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 23, 2024 at 02:26:48PM +0100, Christian Brauner wrote:
> > Make it possible to detected a block device that was opened with
> > restricted write access solely based on its file operations that it was
> > opened with. This avoids wasting an FMODE_* flag.
> > 
> > def_blk_fops isn't needed to check whether something is a block device
> > checking the inode type is enough for that. And def_blk_fops_restricted
> > can be kept private to the block layer.
> 
> I agree with not wasting a FMODE_* flag, but I also really hate
> duplicating the file operations.

I don't think it's that bad and is temporary until we can
unconditionally disable writing to mounted block devices. Until then we
can place all of this under #if IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED)
in a single location in block/fops.c so its nicely encapsulated and
confined.

> I went to search for a good place to stash this information and ended up
> at the f_version field in struct file.  That one is never touched by the 
> VFS proper but just the file system and a few lseek helpers.  The
> latter currently happen to be used by block devices unfortunately,
> but it seem like moving the f_version clearing into the few filesystem
> actualy using it would be good code hygiene anyway.

Kinda like choosing between pest and cholera. I think that the f_op
solution is nicer. Overloading f_version is not something I feel I have
the stomach for. The cleanup itself might still be worth it ofc.
Christoph Hellwig Jan. 30, 2024, 8:32 a.m. UTC | #3
On Mon, Jan 29, 2024 at 06:09:37PM +0100, Christian Brauner wrote:
> I don't think it's that bad and is temporary until we can
> unconditionally disable writing to mounted block devices. Until then we
> can place all of this under #if IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED)
> in a single location in block/fops.c so its nicely encapsulated and
> confined.

Oh well.  If Jens is fine with this I can live with it even if I don't
like it too much.  I'll probably just clean it up as a follow up.

OTOH I fear we won't be able to unconditionally disable writing to
mounted block devices anytime soon if ever.
Christian Brauner Jan. 30, 2024, 9:11 a.m. UTC | #4
On Tue, Jan 30, 2024 at 09:32:13AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 29, 2024 at 06:09:37PM +0100, Christian Brauner wrote:
> > I don't think it's that bad and is temporary until we can
> > unconditionally disable writing to mounted block devices. Until then we
> > can place all of this under #if IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED)
> > in a single location in block/fops.c so its nicely encapsulated and
> > confined.
> 
> Oh well.  If Jens is fine with this I can live with it even if I don't
> like it too much.  I'll probably just clean it up as a follow up.
> 
> OTOH I fear we won't be able to unconditionally disable writing to
> mounted block devices anytime soon if ever.

One my dream. Put another way, if we don't even allow us to think that
we can remove insecure functionality in the future then we have to
accept that we'll be piling on #ifdefine's and mostly unused code
forever which is just sad. :/

I'm hopeful that writing to mounted block devices is something that we
can make all major distros move away from. We should start just because
we need to figure out what tools do actually try to do stuff like that.
Jan Kara Feb. 1, 2024, 11:08 a.m. UTC | #5
On Tue 23-01-24 14:26:48, Christian Brauner wrote:
> Make it possible to detected a block device that was opened with
> restricted write access solely based on its file operations that it was
> opened with. This avoids wasting an FMODE_* flag.
> 
> def_blk_fops isn't needed to check whether something is a block device
> checking the inode type is enough for that. And def_blk_fops_restricted
> can be kept private to the block layer.
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>

I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE
file against a bdev with bdev_writes_blocked() == true, we are sure this is
the handle blocking other writes so we can unblock them in
bdev_yield_write_access()...

								Honza

> ---
>  block/bdev.c | 16 ++++++++++++----
>  block/blk.h  |  2 ++
>  block/fops.c |  3 +++
>  3 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 71eaa1b5b7eb..9d96a43f198d 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -799,13 +799,16 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode)
>  		bdev->bd_writers++;
>  }
>  
> -static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode)
> +static void bdev_yield_write_access(struct file *bdev_file, blk_mode_t mode)
>  {
> +	struct block_device *bdev;
> +
>  	if (bdev_allow_write_mounted)
>  		return;
>  
> +	bdev = file_bdev(bdev_file);
>  	/* Yield exclusive or shared write access. */
> -	if (mode & BLK_OPEN_RESTRICT_WRITES)
> +	if (bdev_file->f_op == &def_blk_fops_restricted)
>  		bdev_unblock_writes(bdev);
>  	else if (mode & BLK_OPEN_WRITE)
>  		bdev->bd_writers--;
> @@ -959,6 +962,7 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>  				   const struct blk_holder_ops *hops)
>  {
>  	struct file *bdev_file;
> +	const struct file_operations *blk_fops;
>  	struct block_device *bdev;
>  	unsigned int flags;
>  	int ret;
> @@ -972,8 +976,12 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>  		return ERR_PTR(-ENXIO);
>  
>  	flags = blk_to_file_flags(mode);
> +	if (mode & BLK_OPEN_RESTRICT_WRITES)
> +		blk_fops = &def_blk_fops_restricted;
> +	else
> +		blk_fops = &def_blk_fops;
>  	bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
> -			blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops);
> +			blockdev_mnt, "", flags | O_LARGEFILE, blk_fops);
>  	if (IS_ERR(bdev_file)) {
>  		blkdev_put_no_open(bdev);
>  		return bdev_file;
> @@ -1033,7 +1041,7 @@ void bdev_release(struct file *bdev_file)
>  		sync_blockdev(bdev);
>  
>  	mutex_lock(&disk->open_mutex);
> -	bdev_yield_write_access(bdev, handle->mode);
> +	bdev_yield_write_access(bdev_file, handle->mode);
>  
>  	if (handle->holder)
>  		bd_end_claim(bdev, handle->holder);
> diff --git a/block/blk.h b/block/blk.h
> index 7ca24814f3a0..dfa958909c54 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -9,6 +9,8 @@
>  
>  struct elevator_type;
>  
> +extern const struct file_operations def_blk_fops_restricted;
> +
>  /* Max future timer expiry for timeouts */
>  #define BLK_MAX_TIMEOUT		(5 * HZ)
>  
> diff --git a/block/fops.c b/block/fops.c
> index 5589bf9c3822..f56bdfe459de 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -862,6 +862,9 @@ const struct file_operations def_blk_fops = {
>  	.fallocate	= blkdev_fallocate,
>  };
>  
> +/* Indicator that this block device is opened with restricted write access. */
> +const struct file_operations def_blk_fops_restricted = def_blk_fops;
> +
>  static __init int blkdev_init(void)
>  {
>  	return bioset_init(&blkdev_dio_pool, 4,
> 
> -- 
> 2.43.0
>
Christian Brauner Feb. 1, 2024, 4:16 p.m. UTC | #6
On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote:
> On Tue 23-01-24 14:26:48, Christian Brauner wrote:
> > Make it possible to detected a block device that was opened with
> > restricted write access solely based on its file operations that it was
> > opened with. This avoids wasting an FMODE_* flag.
> > 
> > def_blk_fops isn't needed to check whether something is a block device
> > checking the inode type is enough for that. And def_blk_fops_restricted
> > can be kept private to the block layer.
> > 
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> 
> I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE
> file against a bdev with bdev_writes_blocked() == true, we are sure this is
> the handle blocking other writes so we can unblock them in
> bdev_yield_write_access()...

Excellent:

commit e2dd15e4c32ad66d938d35e1acd26375a7f355fb
Author:     Christian Brauner <brauner@kernel.org>
AuthorDate: Tue Jan 23 14:26:48 2024 +0100
Commit:     Christian Brauner <brauner@kernel.org>
CommitDate: Thu Feb 1 17:13:16 2024 +0100

    block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access

    Make it possible to detected a block device that was opened with
    restricted write access based only on BLK_OPEN_WRITE and
    bdev->bd_writers < 0 so we won't have to claim another FMODE_* flag.

    Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-31-adbd023e19cc@kernel.org
    base-commit: 0bd1bf95a554f5f877724c27dbe33d4db0af4d0c
    change-id: 20240129-vfs-bdev-file-bd_inode-385a56c57a51
    Signed-off-by: Christian Brauner <brauner@kernel.org>

diff --git a/block/bdev.c b/block/bdev.c
index 9be8c3c683ae..0edeb073e4d8 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -799,13 +799,22 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode)
                bdev->bd_writers++;
 }

-static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode)
+static void bdev_yield_write_access(struct file *bdev_file, blk_mode_t mode)
 {
+       struct block_device *bdev;
+
        if (bdev_allow_write_mounted)
                return;

+       bdev = file_bdev(bdev_file);
        /* Yield exclusive or shared write access. */
-       if (mode & BLK_OPEN_RESTRICT_WRITES)
+       if (mode & BLK_OPEN_WRITE) {
+               if (bdev_writes_blocked(bdev))
+                       bdev_unblock_writes(bdev);
+               else
+                       bdev->bd_writers--;
+       }
+       if (bdev_file->f_op == &def_blk_fops_restricted)
                bdev_unblock_writes(bdev);
        else if (mode & BLK_OPEN_WRITE)
                bdev->bd_writers--;
@@ -1020,7 +1029,7 @@ void bdev_release(struct file *bdev_file)
                sync_blockdev(bdev);

        mutex_lock(&disk->open_mutex);
-       bdev_yield_write_access(bdev, handle->mode);
+       bdev_yield_write_access(bdev_file, handle->mode);

        if (handle->holder)
                bd_end_claim(bdev, handle->holder);
Jan Kara Feb. 1, 2024, 5:36 p.m. UTC | #7
On Thu 01-02-24 17:16:02, Christian Brauner wrote:
> On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote:
> > On Tue 23-01-24 14:26:48, Christian Brauner wrote:
> > > Make it possible to detected a block device that was opened with
> > > restricted write access solely based on its file operations that it was
> > > opened with. This avoids wasting an FMODE_* flag.
> > > 
> > > def_blk_fops isn't needed to check whether something is a block device
> > > checking the inode type is enough for that. And def_blk_fops_restricted
> > > can be kept private to the block layer.
> > > 
> > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > 
> > I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE
> > file against a bdev with bdev_writes_blocked() == true, we are sure this is
> > the handle blocking other writes so we can unblock them in
> > bdev_yield_write_access()...

...

> -       if (mode & BLK_OPEN_RESTRICT_WRITES)
> +       if (mode & BLK_OPEN_WRITE) {
> +               if (bdev_writes_blocked(bdev))
> +                       bdev_unblock_writes(bdev);
> +               else
> +                       bdev->bd_writers--;
> +       }
> +       if (bdev_file->f_op == &def_blk_fops_restricted)

Uh, why are you leaving def_blk_fops_restricted check here? I'd expect you
can delete def_blk_fops_restricted completely...

								Honza
Christian Brauner Feb. 2, 2024, 11:45 a.m. UTC | #8
On Thu, Feb 01, 2024 at 06:36:31PM +0100, Jan Kara wrote:
> On Thu 01-02-24 17:16:02, Christian Brauner wrote:
> > On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote:
> > > On Tue 23-01-24 14:26:48, Christian Brauner wrote:
> > > > Make it possible to detected a block device that was opened with
> > > > restricted write access solely based on its file operations that it was
> > > > opened with. This avoids wasting an FMODE_* flag.
> > > > 
> > > > def_blk_fops isn't needed to check whether something is a block device
> > > > checking the inode type is enough for that. And def_blk_fops_restricted
> > > > can be kept private to the block layer.
> > > > 
> > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > 
> > > I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE
> > > file against a bdev with bdev_writes_blocked() == true, we are sure this is
> > > the handle blocking other writes so we can unblock them in
> > > bdev_yield_write_access()...
> 
> ...
> 
> > -       if (mode & BLK_OPEN_RESTRICT_WRITES)
> > +       if (mode & BLK_OPEN_WRITE) {
> > +               if (bdev_writes_blocked(bdev))
> > +                       bdev_unblock_writes(bdev);
> > +               else
> > +                       bdev->bd_writers--;
> > +       }
> > +       if (bdev_file->f_op == &def_blk_fops_restricted)
> 
> Uh, why are you leaving def_blk_fops_restricted check here? I'd expect you
> can delete def_blk_fops_restricted completely...

Copy-paste error when dumping this into here. Here's the full patch.
Jan Kara Feb. 2, 2024, 11:51 a.m. UTC | #9
On Fri 02-02-24 12:45:49, Christian Brauner wrote:
> On Thu, Feb 01, 2024 at 06:36:31PM +0100, Jan Kara wrote:
> > On Thu 01-02-24 17:16:02, Christian Brauner wrote:
> > > On Thu, Feb 01, 2024 at 12:08:58PM +0100, Jan Kara wrote:
> > > > On Tue 23-01-24 14:26:48, Christian Brauner wrote:
> > > > > Make it possible to detected a block device that was opened with
> > > > > restricted write access solely based on its file operations that it was
> > > > > opened with. This avoids wasting an FMODE_* flag.
> > > > > 
> > > > > def_blk_fops isn't needed to check whether something is a block device
> > > > > checking the inode type is enough for that. And def_blk_fops_restricted
> > > > > can be kept private to the block layer.
> > > > > 
> > > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > 
> > > > I don't think we need def_blk_fops_restricted. If we have BLK_OPEN_WRITE
> > > > file against a bdev with bdev_writes_blocked() == true, we are sure this is
> > > > the handle blocking other writes so we can unblock them in
> > > > bdev_yield_write_access()...
> > 
> > ...
> > 
> > > -       if (mode & BLK_OPEN_RESTRICT_WRITES)
> > > +       if (mode & BLK_OPEN_WRITE) {
> > > +               if (bdev_writes_blocked(bdev))
> > > +                       bdev_unblock_writes(bdev);
> > > +               else
> > > +                       bdev->bd_writers--;
> > > +       }
> > > +       if (bdev_file->f_op == &def_blk_fops_restricted)
> > 
> > Uh, why are you leaving def_blk_fops_restricted check here? I'd expect you
> > can delete def_blk_fops_restricted completely...
> 
> Copy-paste error when dumping this into here. Here's the full patch.

Yes, the full patch looks good to me! Thanks!

								Honza
diff mbox series

Patch

diff --git a/block/bdev.c b/block/bdev.c
index 71eaa1b5b7eb..9d96a43f198d 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -799,13 +799,16 @@  static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode)
 		bdev->bd_writers++;
 }
 
-static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode)
+static void bdev_yield_write_access(struct file *bdev_file, blk_mode_t mode)
 {
+	struct block_device *bdev;
+
 	if (bdev_allow_write_mounted)
 		return;
 
+	bdev = file_bdev(bdev_file);
 	/* Yield exclusive or shared write access. */
-	if (mode & BLK_OPEN_RESTRICT_WRITES)
+	if (bdev_file->f_op == &def_blk_fops_restricted)
 		bdev_unblock_writes(bdev);
 	else if (mode & BLK_OPEN_WRITE)
 		bdev->bd_writers--;
@@ -959,6 +962,7 @@  struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 				   const struct blk_holder_ops *hops)
 {
 	struct file *bdev_file;
+	const struct file_operations *blk_fops;
 	struct block_device *bdev;
 	unsigned int flags;
 	int ret;
@@ -972,8 +976,12 @@  struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 		return ERR_PTR(-ENXIO);
 
 	flags = blk_to_file_flags(mode);
+	if (mode & BLK_OPEN_RESTRICT_WRITES)
+		blk_fops = &def_blk_fops_restricted;
+	else
+		blk_fops = &def_blk_fops;
 	bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
-			blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops);
+			blockdev_mnt, "", flags | O_LARGEFILE, blk_fops);
 	if (IS_ERR(bdev_file)) {
 		blkdev_put_no_open(bdev);
 		return bdev_file;
@@ -1033,7 +1041,7 @@  void bdev_release(struct file *bdev_file)
 		sync_blockdev(bdev);
 
 	mutex_lock(&disk->open_mutex);
-	bdev_yield_write_access(bdev, handle->mode);
+	bdev_yield_write_access(bdev_file, handle->mode);
 
 	if (handle->holder)
 		bd_end_claim(bdev, handle->holder);
diff --git a/block/blk.h b/block/blk.h
index 7ca24814f3a0..dfa958909c54 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -9,6 +9,8 @@ 
 
 struct elevator_type;
 
+extern const struct file_operations def_blk_fops_restricted;
+
 /* Max future timer expiry for timeouts */
 #define BLK_MAX_TIMEOUT		(5 * HZ)
 
diff --git a/block/fops.c b/block/fops.c
index 5589bf9c3822..f56bdfe459de 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -862,6 +862,9 @@  const struct file_operations def_blk_fops = {
 	.fallocate	= blkdev_fallocate,
 };
 
+/* Indicator that this block device is opened with restricted write access. */
+const struct file_operations def_blk_fops_restricted = def_blk_fops;
+
 static __init int blkdev_init(void)
 {
 	return bioset_init(&blkdev_dio_pool, 4,