diff mbox series

[3/3] block: fail writes to read-only devices

Message ID 20230601072829.1258286-4-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [1/3] block: remove a duplicate bdev_read_only declaration | expand

Commit Message

Christoph Hellwig June 1, 2023, 7:28 a.m. UTC
Currently callers can happily submit writes to block devices that are
marked read-only, including to drivers that don't even support writes
and will crash when fed such bios.

While bio submitter should check for read-only devices, that's not a
very robust way of dealing with this.

Note that the last attempt to do this got reverted by Linus in commit
a32e236eb93e ("Partially revert "block: fail op_is_write() requests to
read-only partitions") because device mapper relyied on not enforcing
the read-only state when used together with older lvm-tools.

The lvm side got fixed in:

    https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3

but if people still have older lvm2 tools in use we probably need
to find a workaround for this in device mapper rather than lacking
the core block layer checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

Comments

Linus Torvalds June 2, 2023, 1:02 a.m. UTC | #1
On Thu, Jun 1, 2023 at 3:28 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Note that the last attempt to do this got reverted by Linus in commit
> a32e236eb93e ("Partially revert "block: fail op_is_write() requests to
> read-only partitions") because device mapper relyied on not enforcing
> the read-only state when used together with older lvm-tools.

I'm certainly happy to try again, but I have my doubts whether this will work.

Note that the problem case is for a device that *used* to be writable,
but became read-only later, and there's an existing writer that needs
that writability.

The lvm fix was not to stop writing, but to simply not mark it
read-only too early.

So I do think that the problem here is purely in the block layer.

The logic wrt "bdev_read_only()" is not necessarily a "right now it's
read-only", but more of a thing that should be checked purely when the
device is opened. Which is pretty much exactly what we do.

So honestly, that whole test for

+       if (op_is_write(bio_op(bio)) && bio_sectors(bio) &&
+           bdev_read_only(bdev)) {

may look "obviously correct", but it's also equally valid to view it
as "obviously garbage", simply because the test is being done at the
wrong point.

The same way you can write to a file that was opened for writing, but
has then become read-only afterwards, writing to a device with a bdev
that was writable when you *started* writing is not at all necessarily
wrong.

So please at least consider the possibility that that test - while it
looks obvious - is simply buggy.

                Linus
Christoph Hellwig June 2, 2023, 3:41 p.m. UTC | #2
[quoting your reply out of order, because I think it makes sense that
 way]

On Thu, Jun 01, 2023 at 09:02:25PM -0400, Linus Torvalds wrote:
> So honestly, that whole test for
> 
> +       if (op_is_write(bio_op(bio)) && bio_sectors(bio) &&
> +           bdev_read_only(bdev)) {
> 
> may look "obviously correct", but it's also equally valid to view it
> as "obviously garbage", simply because the test is being done at the
> wrong point.
> 
> The same way you can write to a file that was opened for writing, but
> has then become read-only afterwards, writing to a device with a bdev
> that was writable when you *started* writing is not at all necessarily
> wrong.

files, or more specifically file descriptors really are the wrong
analogy here.  A file descriptor allows you to keep writing to
a file that you were allowed to write to at open time.  And that's
fine (at least most of the time, people keep wanting a revoke and
keep implementing broken special cases of it, but I disgress).

The struct block_device is not such a handle, it's the underlying
object.  And the equivalent here is that we allow writes to inodes
that don't even implement a write method, or have the immutable
bit set.

> The logic wrt "bdev_read_only()" is not necessarily a "right now it's
> read-only", but more of a thing that should be checked purely when the
> device is opened. Which is pretty much exactly what we do.

Except the whole make a thing readonly just for fun is the corner case.
DM does it, and we have a sysfs file to allow it.  But the usual
case is that a block device has been read-only all the time, or has
been force to be read-only by the actual storage device, which
doesn't know anything about the file descriptor model, and will
not be happy.

So maybe a lazy read-only after the last writer goes away would be
nice (not that we actully track writers right now, but that whole
area is comletely fucked up and I'm looking into fixing it at the
moment).

And for extra fun blkdev_get_by_dev doesn't check for read-only
because we've historically allowed to open writable file descriptors
on read-only block devices for ioctls (in addition to the magic
(flags & O_ACCMODE) == 3 mode just ioctl).
Linus Torvalds June 2, 2023, 3:56 p.m. UTC | #3
On Fri, Jun 2, 2023 at 11:41 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Except the whole make a thing readonly just for fun is the corner case.
> DM does it, and we have a sysfs file to allow it.  But the usual
> case is that a block device has been read-only all the time, or has
> been force to be read-only by the actual storage device, which
> doesn't know anything about the file descriptor model, and will
> not be happy.

The "it's always been read-only" case is unambiguous.

So I do wonder if we should have two read-only bits: a "hard" and
"soft" bit, and make the open-time one the hard bit.

Anyway, I'm ok trying this simple thing once more, but if we end up
getting reports of breakage again, I think you may just need to accept
the fact that "somebody turned the device read-only after the fact"
may just be something the kernel will have to continue to deal with.

We might be able to squirrell the "read-only at time of open" bit away
in the file descriptor in the FMODE_CAN_WRITE bit or something like
that (although that would gives the wrong error for write(): -EINVAL
instead of -EROFS or whatever)

                 Linus
Mike Snitzer June 6, 2023, 4:13 p.m. UTC | #4
On Thu, Jun 01 2023 at  3:28P -0400,
Christoph Hellwig <hch@lst.de> wrote:

> Currently callers can happily submit writes to block devices that are
> marked read-only, including to drivers that don't even support writes
> and will crash when fed such bios.
> 
> While bio submitter should check for read-only devices, that's not a
> very robust way of dealing with this.
> 
> Note that the last attempt to do this got reverted by Linus in commit
> a32e236eb93e ("Partially revert "block: fail op_is_write() requests to
> read-only partitions") because device mapper relyied on not enforcing
> the read-only state when used together with older lvm-tools.
> 
> The lvm side got fixed in:
> 
>     https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
> 
> but if people still have older lvm2 tools in use we probably need
> to find a workaround for this in device mapper rather than lacking
> the core block layer checks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Mike Snitzer <snitzer@kernel.org>
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 4ba243968e41eb..ef41816bd0eade 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -492,16 +492,6 @@  static int __init fail_make_request_debugfs(void)
 late_initcall(fail_make_request_debugfs);
 #endif /* CONFIG_FAIL_MAKE_REQUEST */
 
-static inline void bio_check_ro(struct bio *bio)
-{
-	if (op_is_write(bio_op(bio)) && bio_sectors(bio) &&
-	    bdev_read_only(bio->bi_bdev)) {
-		pr_warn("Trying to write to read-only block-device %pg\n",
-			bio->bi_bdev);
-		/* Older lvm-tools actually trigger this */
-	}
-}
-
 static noinline int should_fail_bio(struct bio *bio)
 {
 	if (should_fail_request(bdev_whole(bio->bi_bdev), bio->bi_iter.bi_size))
@@ -735,7 +725,14 @@  void submit_bio_noacct(struct bio *bio)
 
 	if (should_fail_bio(bio))
 		goto end_io;
-	bio_check_ro(bio);
+
+	if (op_is_write(bio_op(bio)) && bio_sectors(bio) &&
+	    bdev_read_only(bdev)) {
+		pr_warn("Trying to write to read-only block-device %pg\n",
+			bdev);
+		goto end_io;
+	}
+
 	if (!bio_flagged(bio, BIO_REMAPPED)) {
 		if (unlikely(bio_check_eod(bio)))
 			goto end_io;