diff mbox series

[1/6] block: Add config option to not allow writing to mounted devices

Message ID 20230704125702.23180-1-jack@suse.cz (mailing list archive)
State Superseded
Headers show
Series block: Add config option to not allow writing to mounted devices | expand

Commit Message

Jan Kara July 4, 2023, 12:56 p.m. UTC
Writing to mounted devices is dangerous and can lead to filesystem
corruption as well as crashes. Furthermore syzbot comes with more and
more involved examples how to corrupt block device under a mounted
filesystem leading to kernel crashes and reports we can do nothing
about. Add tracking of writers to each block device and a kernel cmdline
argument which controls whether writes to block devices open with
BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
this flag for used devices.

Syzbot can use this cmdline argument option to avoid uninteresting
crashes. Also users whose userspace setup does not need writing to
mounted block devices can set this option for hardening.

Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/Kconfig             | 16 ++++++++++
 block/bdev.c              | 63 ++++++++++++++++++++++++++++++++++++++-
 include/linux/blk_types.h |  1 +
 include/linux/blkdev.h    |  3 ++
 4 files changed, 82 insertions(+), 1 deletion(-)

Comments

Colin Walters July 4, 2023, 3:56 p.m. UTC | #1
On Tue, Jul 4, 2023, at 8:56 AM, Jan Kara wrote:
> Writing to mounted devices is dangerous and can lead to filesystem
> corruption as well as crashes. Furthermore syzbot comes with more and
> more involved examples how to corrupt block device under a mounted
> filesystem leading to kernel crashes and reports we can do nothing
> about. Add tracking of writers to each block device and a kernel cmdline
> argument which controls whether writes to block devices open with
> BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
> this flag for used devices.
>
> Syzbot can use this cmdline argument option to avoid uninteresting
> crashes. Also users whose userspace setup does not need writing to
> mounted block devices can set this option for hardening.
>
> Link: 
> https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  block/Kconfig             | 16 ++++++++++
>  block/bdev.c              | 63 ++++++++++++++++++++++++++++++++++++++-
>  include/linux/blk_types.h |  1 +
>  include/linux/blkdev.h    |  3 ++
>  4 files changed, 82 insertions(+), 1 deletion(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index 86122e459fe0..8b4fa105b854 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10
>  	select CRC_T10DIF
>  	select CRC64_ROCKSOFT
> 
> +config BLK_DEV_WRITE_MOUNTED
> +	bool "Allow writing to mounted block devices"
> +	default y
> +	help
> +	When a block device is mounted, writing to its buffer cache very likely

s/very/is very/

> +	going to cause filesystem corruption. It is also rather easy to crash
> +	the kernel in this way since the filesystem has no practical way of
> +	detecting these writes to buffer cache and verifying its metadata
> +	integrity. However there are some setups that need this capability
> +	like running fsck on read-only mounted root device, modifying some
> +	features on mounted ext4 filesystem, and similar. If you say N, the
> +	kernel will prevent processes from writing to block devices that are
> +	mounted by filesystems which provides some more protection from runaway
> +	priviledged processes. If in doubt, say Y. The configuration can be

s/priviledged/privileged/

> +	overridden with bdev_allow_write_mounted boot option.

s/with/with the/

> +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
> +#define BLK_OPEN_BLOCK_WRITES	((__force blk_mode_t)(1 << 5))

Bikeshed but: I think BLK and BLOCK "stutter" here.  The doc comment already uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ?
Eric Biggers July 4, 2023, 4:52 p.m. UTC | #2
On Tue, Jul 04, 2023 at 11:56:44AM -0400, Colin Walters wrote:
> > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
> > +#define BLK_OPEN_BLOCK_WRITES	((__force blk_mode_t)(1 << 5))
> 
> Bikeshed but: I think BLK and BLOCK "stutter" here.  The doc comment already
> uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ?  

Yeah, using "block" in two different ways at the same time is confusing.
BLK_OPEN_EXCLUSIVE would probably be good, as would something like
BLK_OPEN_RESTRICT_WRITES.

I can't figure out how to apply this patch series, so I can't really see it in
context though.

- Eric
Eric Biggers July 4, 2023, 6:44 p.m. UTC | #3
On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote:
> Writing to mounted devices is dangerous and can lead to filesystem
> corruption as well as crashes. Furthermore syzbot comes with more and
> more involved examples how to corrupt block device under a mounted
> filesystem leading to kernel crashes and reports we can do nothing
> about. Add tracking of writers to each block device and a kernel cmdline
> argument which controls whether writes to block devices open with
> BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
> this flag for used devices.
> 
> Syzbot can use this cmdline argument option to avoid uninteresting
> crashes. Also users whose userspace setup does not need writing to
> mounted block devices can set this option for hardening.
> 
> Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  block/Kconfig             | 16 ++++++++++
>  block/bdev.c              | 63 ++++++++++++++++++++++++++++++++++++++-
>  include/linux/blk_types.h |  1 +
>  include/linux/blkdev.h    |  3 ++
>  4 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 86122e459fe0..8b4fa105b854 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10
>  	select CRC_T10DIF
>  	select CRC64_ROCKSOFT
>  
> +config BLK_DEV_WRITE_MOUNTED
> +	bool "Allow writing to mounted block devices"
> +	default y
> +	help
> +	When a block device is mounted, writing to its buffer cache very likely
> +	going to cause filesystem corruption. It is also rather easy to crash
> +	the kernel in this way since the filesystem has no practical way of
> +	detecting these writes to buffer cache and verifying its metadata
> +	integrity. However there are some setups that need this capability
> +	like running fsck on read-only mounted root device, modifying some
> +	features on mounted ext4 filesystem, and similar. If you say N, the
> +	kernel will prevent processes from writing to block devices that are
> +	mounted by filesystems which provides some more protection from runaway
> +	priviledged processes. If in doubt, say Y. The configuration can be
> +	overridden with bdev_allow_write_mounted boot option.

Does this prevent the underlying storage from being written to?  Say if the
mounted block device is /dev/sda1 and someone tries to write to /dev/sda in the
region that contains sda1.

I *think* the answer is no, writes to /dev/sda are still allowed since the goal
is just to prevent writes to the buffer cache of mounted block devices, not
writes to the underlying storage.  That is really something that should be
stated explicitly, though.

- Eric
Theodore Ts'o July 4, 2023, 8:55 p.m. UTC | #4
On Tue, Jul 04, 2023 at 11:44:16AM -0700, Eric Biggers wrote:
> Does this prevent the underlying storage from being written to?  Say if the
> mounted block device is /dev/sda1 and someone tries to write to /dev/sda in the
> region that contains sda1.
> 
> I *think* the answer is no, writes to /dev/sda are still allowed since the goal
> is just to prevent writes to the buffer cache of mounted block devices, not
> writes to the underlying storage.  That is really something that should be
> stated explicitly, though.

Well, at the risk of giving the Syzbot developers any ideas, we also
aren't preventing someone from opening the SCSI generic device and
manually sending raw SCSI commands to modify a mounted block device,
and then no doubt they would claim that the kernel config
CONFIG_CHR_DEV_SG is "insecure", and so therefore any kernel that
could support writing CD or DVD's is by definition "insecure" by their
lights...

Which is why talking about security models without having an agreed
upon threat model is really a waste of time...

     	    	     	      	       - Ted
Jan Kara July 5, 2023, 10:30 a.m. UTC | #5
On Tue 04-07-23 11:44:16, Eric Biggers wrote:
> On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote:
> > Writing to mounted devices is dangerous and can lead to filesystem
> > corruption as well as crashes. Furthermore syzbot comes with more and
> > more involved examples how to corrupt block device under a mounted
> > filesystem leading to kernel crashes and reports we can do nothing
> > about. Add tracking of writers to each block device and a kernel cmdline
> > argument which controls whether writes to block devices open with
> > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
> > this flag for used devices.
> > 
> > Syzbot can use this cmdline argument option to avoid uninteresting
> > crashes. Also users whose userspace setup does not need writing to
> > mounted block devices can set this option for hardening.
> > 
> > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  block/Kconfig             | 16 ++++++++++
> >  block/bdev.c              | 63 ++++++++++++++++++++++++++++++++++++++-
> >  include/linux/blk_types.h |  1 +
> >  include/linux/blkdev.h    |  3 ++
> >  4 files changed, 82 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/Kconfig b/block/Kconfig
> > index 86122e459fe0..8b4fa105b854 100644
> > --- a/block/Kconfig
> > +++ b/block/Kconfig
> > @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10
> >  	select CRC_T10DIF
> >  	select CRC64_ROCKSOFT
> >  
> > +config BLK_DEV_WRITE_MOUNTED
> > +	bool "Allow writing to mounted block devices"
> > +	default y
> > +	help
> > +	When a block device is mounted, writing to its buffer cache very likely
> > +	going to cause filesystem corruption. It is also rather easy to crash
> > +	the kernel in this way since the filesystem has no practical way of
> > +	detecting these writes to buffer cache and verifying its metadata
> > +	integrity. However there are some setups that need this capability
> > +	like running fsck on read-only mounted root device, modifying some
> > +	features on mounted ext4 filesystem, and similar. If you say N, the
> > +	kernel will prevent processes from writing to block devices that are
> > +	mounted by filesystems which provides some more protection from runaway
> > +	priviledged processes. If in doubt, say Y. The configuration can be
> > +	overridden with bdev_allow_write_mounted boot option.
> 
> Does this prevent the underlying storage from being written to?  Say if the
> mounted block device is /dev/sda1 and someone tries to write to /dev/sda in the
> region that contains sda1.
> 
> I *think* the answer is no, writes to /dev/sda are still allowed since the goal
> is just to prevent writes to the buffer cache of mounted block devices, not
> writes to the underlying storage.  That is really something that should be
> stated explicitly, though.

You are correct. The answer is "no" because as Ted says, there are many
ways to do that anyway and for a filesystem it is generally not much
different from just corrupted fs image. I'll explicitely mention it in the
config text, that's a good idea.

								Honza
Darrick J. Wong July 5, 2023, 3:12 p.m. UTC | #6
On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote:
> Writing to mounted devices is dangerous and can lead to filesystem
> corruption as well as crashes. Furthermore syzbot comes with more and
> more involved examples how to corrupt block device under a mounted
> filesystem leading to kernel crashes and reports we can do nothing
> about. Add tracking of writers to each block device and a kernel cmdline
> argument which controls whether writes to block devices open with
> BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
> this flag for used devices.
> 
> Syzbot can use this cmdline argument option to avoid uninteresting
> crashes. Also users whose userspace setup does not need writing to
> mounted block devices can set this option for hardening.
> 
> Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  block/Kconfig             | 16 ++++++++++
>  block/bdev.c              | 63 ++++++++++++++++++++++++++++++++++++++-
>  include/linux/blk_types.h |  1 +
>  include/linux/blkdev.h    |  3 ++
>  4 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 86122e459fe0..8b4fa105b854 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10
>  	select CRC_T10DIF
>  	select CRC64_ROCKSOFT
>  
> +config BLK_DEV_WRITE_MOUNTED
> +	bool "Allow writing to mounted block devices"
> +	default y
> +	help
> +	When a block device is mounted, writing to its buffer cache very likely
> +	going to cause filesystem corruption. It is also rather easy to crash
> +	the kernel in this way since the filesystem has no practical way of
> +	detecting these writes to buffer cache and verifying its metadata
> +	integrity. However there are some setups that need this capability
> +	like running fsck on read-only mounted root device, modifying some
> +	features on mounted ext4 filesystem, and similar. If you say N, the
> +	kernel will prevent processes from writing to block devices that are
> +	mounted by filesystems which provides some more protection from runaway
> +	priviledged processes. If in doubt, say Y. The configuration can be
> +	overridden with bdev_allow_write_mounted boot option.
> +
>  config BLK_DEV_ZONED
>  	bool "Zoned block device support"
>  	select MQ_IOSCHED_DEADLINE
> diff --git a/block/bdev.c b/block/bdev.c
> index 523ea7289834..346e68dbf0bf 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -30,6 +30,9 @@
>  #include "../fs/internal.h"
>  #include "blk.h"
>  
> +/* Should we allow writing to mounted block devices? */
> +static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED);

This might be premature at this point, but I wonder if you've given any
consideration to adding a lockdown prohibition as well?  e.g.

static inline bool bdev_allow_write_mounted(void)
{
	if (security_locked_down(LOCKDOWN_MOUNTED_BDEV) != 0)
		return false;

	return __bdev_allow_write_mounted;
}

--D

>  struct bdev_inode {
>  	struct block_device bdev;
>  	struct inode vfs_inode;
> @@ -744,7 +747,34 @@ void blkdev_put_no_open(struct block_device *bdev)
>  {
>  	put_device(&bdev->bd_device);
>  }
> -	
> +
> +static bool bdev_writes_blocked(struct block_device *bdev)
> +{
> +	return bdev->bd_writers == -1;
> +}
> +
> +static void bdev_block_writes(struct block_device *bdev)
> +{
> +	bdev->bd_writers = -1;
> +}
> +
> +static void bdev_unblock_writes(struct block_device *bdev)
> +{
> +	bdev->bd_writers = 0;
> +}
> +
> +static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode)
> +{
> +	if (!bdev_allow_write_mounted) {
> +		/* Writes blocked? */
> +		if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
> +			return false;
> +		if (mode & BLK_OPEN_BLOCK_WRITES && bdev->bd_writers > 0)
> +			return false;
> +	}
> +	return true;
> +}
> +
>  /**
>   * blkdev_get_by_dev - open a block device by device number
>   * @dev: device number of block device to open
> @@ -787,6 +817,10 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>  	if (ret)
>  		goto free_handle;
>  
> +	/* Blocking writes requires exclusive opener */
> +	if (mode & BLK_OPEN_BLOCK_WRITES && !holder)
> +		return ERR_PTR(-EINVAL);
> +
>  	bdev = blkdev_get_no_open(dev);
>  	if (!bdev) {
>  		ret = -ENXIO;
> @@ -814,12 +848,21 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>  		goto abort_claiming;
>  	if (!try_module_get(disk->fops->owner))
>  		goto abort_claiming;
> +	ret = -EBUSY;
> +	if (!blkdev_open_compatible(bdev, mode))
> +		goto abort_claiming;
>  	if (bdev_is_partition(bdev))
>  		ret = blkdev_get_part(bdev, mode);
>  	else
>  		ret = blkdev_get_whole(bdev, mode);
>  	if (ret)
>  		goto put_module;
> +	if (!bdev_allow_write_mounted) {
> +		if (mode & BLK_OPEN_BLOCK_WRITES)
> +			bdev_block_writes(bdev);
> +		else if (mode & BLK_OPEN_WRITE)
> +			bdev->bd_writers++;
> +	}
>  	if (holder) {
>  		bd_finish_claiming(bdev, holder, hops);
>  
> @@ -842,6 +885,7 @@ struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>  		disk_unblock_events(disk);
>  	handle->bdev = bdev;
>  	handle->holder = holder;
> +	handle->mode = mode;
>  	return handle;
>  put_module:
>  	module_put(disk->fops->owner);
> @@ -914,6 +958,14 @@ void blkdev_put(struct bdev_handle *handle)
>  		sync_blockdev(bdev);
>  
>  	mutex_lock(&disk->open_mutex);
> +	if (!bdev_allow_write_mounted) {
> +		/* The exclusive opener was blocking writes? Unblock them. */
> +		if (handle->mode & BLK_OPEN_BLOCK_WRITES)
> +			bdev_unblock_writes(bdev);
> +		else if (handle->mode & BLK_OPEN_WRITE)
> +			bdev->bd_writers--;
> +	}
> +
>  	if (handle->holder)
>  		bd_end_claim(bdev, handle->holder);
>  
> @@ -1070,3 +1122,12 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>  
>  	blkdev_put_no_open(bdev);
>  }
> +
> +static int __init setup_bdev_allow_write_mounted(char *str)
> +{
> +	if (kstrtobool(str, &bdev_allow_write_mounted))
> +		pr_warn("Invalid option string for bdev_allow_write_mounted:"
> +			" '%s'\n", str);
> +	return 1;
> +}
> +__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 0bad62cca3d0..5bf0d2d458fd 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -70,6 +70,7 @@ struct block_device {
>  #ifdef CONFIG_FAIL_MAKE_REQUEST
>  	bool			bd_make_it_fail;
>  #endif
> +	int			bd_writers;
>  	/*
>  	 * keep this out-of-line as it's both big and not needed in the fast
>  	 * path
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4ae3647a0322..ca467525e6e4 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -124,6 +124,8 @@ typedef unsigned int __bitwise blk_mode_t;
>  #define BLK_OPEN_NDELAY		((__force blk_mode_t)(1 << 3))
>  /* open for "writes" only for ioctls (specialy hack for floppy.c) */
>  #define BLK_OPEN_WRITE_IOCTL	((__force blk_mode_t)(1 << 4))
> +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
> +#define BLK_OPEN_BLOCK_WRITES	((__force blk_mode_t)(1 << 5))
>  
>  struct gendisk {
>  	/*
> @@ -1474,6 +1476,7 @@ struct blk_holder_ops {
>  struct bdev_handle {
>  	struct block_device *bdev;
>  	void *holder;
> +	blk_mode_t mode;
>  };
>  
>  struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> -- 
> 2.35.3
>
Jan Kara Aug. 14, 2023, 4:41 p.m. UTC | #7
On Tue 04-07-23 09:52:40, Eric Biggers wrote:
> On Tue, Jul 04, 2023 at 11:56:44AM -0400, Colin Walters wrote:
> > > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
> > > +#define BLK_OPEN_BLOCK_WRITES	((__force blk_mode_t)(1 << 5))
> > 
> > Bikeshed but: I think BLK and BLOCK "stutter" here.  The doc comment already
> > uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ?  
> 
> Yeah, using "block" in two different ways at the same time is confusing.
> BLK_OPEN_EXCLUSIVE would probably be good, as would something like
> BLK_OPEN_RESTRICT_WRITES.

BLK_OPEN_RESTRICT_WRITES sounds good to me. I'll rename the flag.

								Honza
Jan Kara Aug. 14, 2023, 4:43 p.m. UTC | #8
On Tue 04-07-23 11:56:44, Colin Walters wrote:
> On Tue, Jul 4, 2023, at 8:56 AM, Jan Kara wrote:
> > Writing to mounted devices is dangerous and can lead to filesystem
> > corruption as well as crashes. Furthermore syzbot comes with more and
> > more involved examples how to corrupt block device under a mounted
> > filesystem leading to kernel crashes and reports we can do nothing
> > about. Add tracking of writers to each block device and a kernel cmdline
> > argument which controls whether writes to block devices open with
> > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
> > this flag for used devices.
> >
> > Syzbot can use this cmdline argument option to avoid uninteresting
> > crashes. Also users whose userspace setup does not need writing to
> > mounted block devices can set this option for hardening.
> >
> > Link: 
> > https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  block/Kconfig             | 16 ++++++++++
> >  block/bdev.c              | 63 ++++++++++++++++++++++++++++++++++++++-
> >  include/linux/blk_types.h |  1 +
> >  include/linux/blkdev.h    |  3 ++
> >  4 files changed, 82 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/Kconfig b/block/Kconfig
> > index 86122e459fe0..8b4fa105b854 100644
> > --- a/block/Kconfig
> > +++ b/block/Kconfig
> > @@ -77,6 +77,22 @@ config BLK_DEV_INTEGRITY_T10
> >  	select CRC_T10DIF
> >  	select CRC64_ROCKSOFT
> > 
> > +config BLK_DEV_WRITE_MOUNTED
> > +	bool "Allow writing to mounted block devices"
> > +	default y
> > +	help
> > +	When a block device is mounted, writing to its buffer cache very likely
> 
> s/very/is very/
> 
> > +	going to cause filesystem corruption. It is also rather easy to crash
> > +	the kernel in this way since the filesystem has no practical way of
> > +	detecting these writes to buffer cache and verifying its metadata
> > +	integrity. However there are some setups that need this capability
> > +	like running fsck on read-only mounted root device, modifying some
> > +	features on mounted ext4 filesystem, and similar. If you say N, the
> > +	kernel will prevent processes from writing to block devices that are
> > +	mounted by filesystems which provides some more protection from runaway
> > +	priviledged processes. If in doubt, say Y. The configuration can be
> 
> s/priviledged/privileged/
> 
> > +	overridden with bdev_allow_write_mounted boot option.
> 
> s/with/with the/

Thanks for the language fixes!

> > +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
> > +#define BLK_OPEN_BLOCK_WRITES	((__force blk_mode_t)(1 << 5))
> 
> Bikeshed but: I think BLK and BLOCK "stutter" here.  The doc comment
> already uses the term "exclusive" so how about BLK_OPEN_EXCLUSIVE ?  

Well, we already have exclusive opens of block devices which are different
(they are exclusive only wrt other exclusive opens) so BLK_OPEN_EXCLUSIVE
will be really confusing. But BLK_OPEN_RESTRICT_WRITES sounds good to me.

								Honza
Eric Biggers Aug. 22, 2023, 5:35 a.m. UTC | #9
Hi Jan,

On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote:
> Writing to mounted devices is dangerous and can lead to filesystem
> corruption as well as crashes. Furthermore syzbot comes with more and
> more involved examples how to corrupt block device under a mounted
> filesystem leading to kernel crashes and reports we can do nothing
> about. Add tracking of writers to each block device and a kernel cmdline
> argument which controls whether writes to block devices open with
> BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
> this flag for used devices.
> 
> Syzbot can use this cmdline argument option to avoid uninteresting
> crashes. Also users whose userspace setup does not need writing to
> mounted block devices can set this option for hardening.
> 
> Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> Signed-off-by: Jan Kara <jack@suse.cz>

Can you make it clear that the important thing this patch prevents is writes to
the block device's buffer cache, not writes to the underlying storage?  It's
super important not to confuse the two cases.

Related to this topic, I wonder if there is any value in providing an option
that would allow O_DIRECT writes but forbid buffered writes?  Would that be
useful for any of the known use cases for writing to mounted block devices?

- Eric
Jan Kara Aug. 22, 2023, 10:11 a.m. UTC | #10
Hi Eric!

On Mon 21-08-23 22:35:23, Eric Biggers wrote:
> On Tue, Jul 04, 2023 at 02:56:49PM +0200, Jan Kara wrote:
> > Writing to mounted devices is dangerous and can lead to filesystem
> > corruption as well as crashes. Furthermore syzbot comes with more and
> > more involved examples how to corrupt block device under a mounted
> > filesystem leading to kernel crashes and reports we can do nothing
> > about. Add tracking of writers to each block device and a kernel cmdline
> > argument which controls whether writes to block devices open with
> > BLK_OPEN_BLOCK_WRITES flag are allowed. We will make filesystems use
> > this flag for used devices.
> > 
> > Syzbot can use this cmdline argument option to avoid uninteresting
> > crashes. Also users whose userspace setup does not need writing to
> > mounted block devices can set this option for hardening.
> > 
> > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Can you make it clear that the important thing this patch prevents is
> writes to the block device's buffer cache, not writes to the underlying
> storage?  It's super important not to confuse the two cases.

Right, I've already updated the description of the help text in the kconfig
to explicitely explain that this does not prevent underlying device content
from being modified, it just prevents writes the the block device itself.
But I guess I can also explain this (with a bit more technical details) in
the changelog. Good idea.

> Related to this topic, I wonder if there is any value in providing an option
> that would allow O_DIRECT writes but forbid buffered writes?  Would that be
> useful for any of the known use cases for writing to mounted block devices?

I'm not sure how useful that would be but it would be certainly rather
difficult to implement. The problem is we can currently fallback from
direct to buffered IO as we see fit, also we need to invalidate page cache
while doing direct IO which can fail etc. So it will be a rather nasty can
of worms to open...

								Honza
Aleksandr Nogikh Oct. 19, 2023, 9:16 a.m. UTC | #11
Hi Jan,

Thank you for the series!

Have you already had a chance to push an updated version of it?
I tried to search LKML, but didn't find anything.

Or did you decide to put it off until later?
Jan Kara Oct. 24, 2023, 11:10 a.m. UTC | #12
Hi!

On Thu 19-10-23 11:16:55, Aleksandr Nogikh wrote:
> Thank you for the series!
> 
> Have you already had a chance to push an updated version of it?
> I tried to search LKML, but didn't find anything.
> 
> Or did you decide to put it off until later?

So there is preliminary series sitting in VFS tree that changes how block
devices are open. There are some conflicts with btrfs tree and bcachefs
merge that complicate all this (plus there was quite some churn in VFS
itself due to changing rules how block devices are open) so I didn't push
out the series that actually forbids opening of mounted block devices
because that would cause a "merge from hell" issues. I plan to push out the
remaining patches once the merge window closes and all the dependencies are
hopefully in a stable state. Maybe I can push out the series earlier based
on linux-next so that people can have a look at the current state.

								Honza
Aleksandr Nogikh Oct. 27, 2023, 12:06 p.m. UTC | #13
I see, thanks for sharing the details!

We'll set CONFIG_BLK_DEV_WRITE_MOUNTED=n on syzbot once the series is
in linux-next.

On Tue, Oct 24, 2023 at 1:10 PM Jan Kara <jack@suse.cz> wrote:
>
> Hi!
>
> On Thu 19-10-23 11:16:55, Aleksandr Nogikh wrote:
> > Thank you for the series!
> >
> > Have you already had a chance to push an updated version of it?
> > I tried to search LKML, but didn't find anything.
> >
> > Or did you decide to put it off until later?
>
> So there is preliminary series sitting in VFS tree that changes how block
> devices are open. There are some conflicts with btrfs tree and bcachefs
> merge that complicate all this (plus there was quite some churn in VFS
> itself due to changing rules how block devices are open) so I didn't push
> out the series that actually forbids opening of mounted block devices
> because that would cause a "merge from hell" issues. I plan to push out the
> remaining patches once the merge window closes and all the dependencies are
> hopefully in a stable state. Maybe I can push out the series earlier based
> on linux-next so that people can have a look at the current state.
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
Jan Kara Nov. 8, 2023, 10:10 a.m. UTC | #14
Hi!

On Tue 24-10-23 13:10:15, Jan Kara wrote:
> On Thu 19-10-23 11:16:55, Aleksandr Nogikh wrote:
> > Thank you for the series!
> > 
> > Have you already had a chance to push an updated version of it?
> > I tried to search LKML, but didn't find anything.
> > 
> > Or did you decide to put it off until later?
> 
> So there is preliminary series sitting in VFS tree that changes how block
> devices are open. There are some conflicts with btrfs tree and bcachefs
> merge that complicate all this (plus there was quite some churn in VFS
> itself due to changing rules how block devices are open) so I didn't push
> out the series that actually forbids opening of mounted block devices
> because that would cause a "merge from hell" issues. I plan to push out the
> remaining patches once the merge window closes and all the dependencies are
> hopefully in a stable state. Maybe I can push out the series earlier based
> on linux-next so that people can have a look at the current state.

So patches are now in VFS tree [1] so they should be in linux-next as well.
You should be able to start using the config option for syzbot runs :)

								Honza

[1] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.super
Aleksandr Nogikh Nov. 8, 2023, 6:24 p.m. UTC | #15
Hi!

Thanks for letting me know!

I've sent a PR with new syzbot configs:
https://github.com/google/syzkaller/pull/4324
diff mbox series

Patch

diff --git a/block/Kconfig b/block/Kconfig
index 86122e459fe0..8b4fa105b854 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -77,6 +77,22 @@  config BLK_DEV_INTEGRITY_T10
 	select CRC_T10DIF
 	select CRC64_ROCKSOFT
 
+config BLK_DEV_WRITE_MOUNTED
+	bool "Allow writing to mounted block devices"
+	default y
+	help
+	When a block device is mounted, writing to its buffer cache very likely
+	going to cause filesystem corruption. It is also rather easy to crash
+	the kernel in this way since the filesystem has no practical way of
+	detecting these writes to buffer cache and verifying its metadata
+	integrity. However there are some setups that need this capability
+	like running fsck on read-only mounted root device, modifying some
+	features on mounted ext4 filesystem, and similar. If you say N, the
+	kernel will prevent processes from writing to block devices that are
+	mounted by filesystems which provides some more protection from runaway
+	priviledged processes. If in doubt, say Y. The configuration can be
+	overridden with bdev_allow_write_mounted boot option.
+
 config BLK_DEV_ZONED
 	bool "Zoned block device support"
 	select MQ_IOSCHED_DEADLINE
diff --git a/block/bdev.c b/block/bdev.c
index 523ea7289834..346e68dbf0bf 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -30,6 +30,9 @@ 
 #include "../fs/internal.h"
 #include "blk.h"
 
+/* Should we allow writing to mounted block devices? */
+static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED);
+
 struct bdev_inode {
 	struct block_device bdev;
 	struct inode vfs_inode;
@@ -744,7 +747,34 @@  void blkdev_put_no_open(struct block_device *bdev)
 {
 	put_device(&bdev->bd_device);
 }
-	
+
+static bool bdev_writes_blocked(struct block_device *bdev)
+{
+	return bdev->bd_writers == -1;
+}
+
+static void bdev_block_writes(struct block_device *bdev)
+{
+	bdev->bd_writers = -1;
+}
+
+static void bdev_unblock_writes(struct block_device *bdev)
+{
+	bdev->bd_writers = 0;
+}
+
+static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode)
+{
+	if (!bdev_allow_write_mounted) {
+		/* Writes blocked? */
+		if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
+			return false;
+		if (mode & BLK_OPEN_BLOCK_WRITES && bdev->bd_writers > 0)
+			return false;
+	}
+	return true;
+}
+
 /**
  * blkdev_get_by_dev - open a block device by device number
  * @dev: device number of block device to open
@@ -787,6 +817,10 @@  struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 	if (ret)
 		goto free_handle;
 
+	/* Blocking writes requires exclusive opener */
+	if (mode & BLK_OPEN_BLOCK_WRITES && !holder)
+		return ERR_PTR(-EINVAL);
+
 	bdev = blkdev_get_no_open(dev);
 	if (!bdev) {
 		ret = -ENXIO;
@@ -814,12 +848,21 @@  struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 		goto abort_claiming;
 	if (!try_module_get(disk->fops->owner))
 		goto abort_claiming;
+	ret = -EBUSY;
+	if (!blkdev_open_compatible(bdev, mode))
+		goto abort_claiming;
 	if (bdev_is_partition(bdev))
 		ret = blkdev_get_part(bdev, mode);
 	else
 		ret = blkdev_get_whole(bdev, mode);
 	if (ret)
 		goto put_module;
+	if (!bdev_allow_write_mounted) {
+		if (mode & BLK_OPEN_BLOCK_WRITES)
+			bdev_block_writes(bdev);
+		else if (mode & BLK_OPEN_WRITE)
+			bdev->bd_writers++;
+	}
 	if (holder) {
 		bd_finish_claiming(bdev, holder, hops);
 
@@ -842,6 +885,7 @@  struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 		disk_unblock_events(disk);
 	handle->bdev = bdev;
 	handle->holder = holder;
+	handle->mode = mode;
 	return handle;
 put_module:
 	module_put(disk->fops->owner);
@@ -914,6 +958,14 @@  void blkdev_put(struct bdev_handle *handle)
 		sync_blockdev(bdev);
 
 	mutex_lock(&disk->open_mutex);
+	if (!bdev_allow_write_mounted) {
+		/* The exclusive opener was blocking writes? Unblock them. */
+		if (handle->mode & BLK_OPEN_BLOCK_WRITES)
+			bdev_unblock_writes(bdev);
+		else if (handle->mode & BLK_OPEN_WRITE)
+			bdev->bd_writers--;
+	}
+
 	if (handle->holder)
 		bd_end_claim(bdev, handle->holder);
 
@@ -1070,3 +1122,12 @@  void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 
 	blkdev_put_no_open(bdev);
 }
+
+static int __init setup_bdev_allow_write_mounted(char *str)
+{
+	if (kstrtobool(str, &bdev_allow_write_mounted))
+		pr_warn("Invalid option string for bdev_allow_write_mounted:"
+			" '%s'\n", str);
+	return 1;
+}
+__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0bad62cca3d0..5bf0d2d458fd 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -70,6 +70,7 @@  struct block_device {
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 	bool			bd_make_it_fail;
 #endif
+	int			bd_writers;
 	/*
 	 * keep this out-of-line as it's both big and not needed in the fast
 	 * path
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4ae3647a0322..ca467525e6e4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -124,6 +124,8 @@  typedef unsigned int __bitwise blk_mode_t;
 #define BLK_OPEN_NDELAY		((__force blk_mode_t)(1 << 3))
 /* open for "writes" only for ioctls (specialy hack for floppy.c) */
 #define BLK_OPEN_WRITE_IOCTL	((__force blk_mode_t)(1 << 4))
+/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
+#define BLK_OPEN_BLOCK_WRITES	((__force blk_mode_t)(1 << 5))
 
 struct gendisk {
 	/*
@@ -1474,6 +1476,7 @@  struct blk_holder_ops {
 struct bdev_handle {
 	struct block_device *bdev;
 	void *holder;
+	blk_mode_t mode;
 };
 
 struct bdev_handle *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,