Message ID | 20230601072829.1258286-4-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] block: remove a duplicate bdev_read_only declaration | expand |
On Thu, Jun 1, 2023 at 3:28 AM Christoph Hellwig <hch@lst.de> wrote: > > Note that the last attempt to do this got reverted by Linus in commit > a32e236eb93e ("Partially revert "block: fail op_is_write() requests to > read-only partitions") because device mapper relyied on not enforcing > the read-only state when used together with older lvm-tools. I'm certainly happy to try again, but I have my doubts whether this will work. Note that the problem case is for a device that *used* to be writable, but became read-only later, and there's an existing writer that needs that writability. The lvm fix was not to stop writing, but to simply not mark it read-only too early. So I do think that the problem here is purely in the block layer. The logic wrt "bdev_read_only()" is not necessarily a "right now it's read-only", but more of a thing that should be checked purely when the device is opened. Which is pretty much exactly what we do. So honestly, that whole test for + if (op_is_write(bio_op(bio)) && bio_sectors(bio) && + bdev_read_only(bdev)) { may look "obviously correct", but it's also equally valid to view it as "obviously garbage", simply because the test is being done at the wrong point. The same way you can write to a file that was opened for writing, but has then become read-only afterwards, writing to a device with a bdev that was writable when you *started* writing is not at all necessarily wrong. So please at least consider the possibility that that test - while it looks obvious - is simply buggy. Linus
[quoting your reply out of order, because I think it makes sense that way] On Thu, Jun 01, 2023 at 09:02:25PM -0400, Linus Torvalds wrote: > So honestly, that whole test for > > + if (op_is_write(bio_op(bio)) && bio_sectors(bio) && > + bdev_read_only(bdev)) { > > may look "obviously correct", but it's also equally valid to view it > as "obviously garbage", simply because the test is being done at the > wrong point. > > The same way you can write to a file that was opened for writing, but > has then become read-only afterwards, writing to a device with a bdev > that was writable when you *started* writing is not at all necessarily > wrong. files, or more specifically file descriptors really are the wrong analogy here. A file descriptor allows you to keep writing to a file that you were allowed to write to at open time. And that's fine (at least most of the time, people keep wanting a revoke and keep implementing broken special cases of it, but I disgress). The struct block_device is not such a handle, it's the underlying object. And the equivalent here is that we allow writes to inodes that don't even implement a write method, or have the immutable bit set. > The logic wrt "bdev_read_only()" is not necessarily a "right now it's > read-only", but more of a thing that should be checked purely when the > device is opened. Which is pretty much exactly what we do. Except the whole make a thing readonly just for fun is the corner case. DM does it, and we have a sysfs file to allow it. But the usual case is that a block device has been read-only all the time, or has been force to be read-only by the actual storage device, which doesn't know anything about the file descriptor model, and will not be happy. So maybe a lazy read-only after the last writer goes away would be nice (not that we actully track writers right now, but that whole area is comletely fucked up and I'm looking into fixing it at the moment). And for extra fun blkdev_get_by_dev doesn't check for read-only because we've historically allowed to open writable file descriptors on read-only block devices for ioctls (in addition to the magic (flags & O_ACCMODE) == 3 mode just ioctl).
On Fri, Jun 2, 2023 at 11:41 AM Christoph Hellwig <hch@lst.de> wrote: > > Except the whole make a thing readonly just for fun is the corner case. > DM does it, and we have a sysfs file to allow it. But the usual > case is that a block device has been read-only all the time, or has > been force to be read-only by the actual storage device, which > doesn't know anything about the file descriptor model, and will > not be happy. The "it's always been read-only" case is unambiguous. So I do wonder if we should have two read-only bits: a "hard" and "soft" bit, and make the open-time one the hard bit. Anyway, I'm ok trying this simple thing once more, but if we end up getting reports of breakage again, I think you may just need to accept the fact that "somebody turned the device read-only after the fact" may just be something the kernel will have to continue to deal with. We might be able to squirrell the "read-only at time of open" bit away in the file descriptor in the FMODE_CAN_WRITE bit or something like that (although that would gives the wrong error for write(): -EINVAL instead of -EROFS or whatever) Linus
On Thu, Jun 01 2023 at 3:28P -0400, Christoph Hellwig <hch@lst.de> wrote: > Currently callers can happily submit writes to block devices that are > marked read-only, including to drivers that don't even support writes > and will crash when fed such bios. > > While bio submitter should check for read-only devices, that's not a > very robust way of dealing with this. > > Note that the last attempt to do this got reverted by Linus in commit > a32e236eb93e ("Partially revert "block: fail op_is_write() requests to > read-only partitions") because device mapper relyied on not enforcing > the read-only state when used together with older lvm-tools. > > The lvm side got fixed in: > > https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3 > > but if people still have older lvm2 tools in use we probably need > to find a workaround for this in device mapper rather than lacking > the core block layer checks. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@kernel.org>
diff --git a/block/blk-core.c b/block/blk-core.c index 4ba243968e41eb..ef41816bd0eade 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -492,16 +492,6 @@ static int __init fail_make_request_debugfs(void) late_initcall(fail_make_request_debugfs); #endif /* CONFIG_FAIL_MAKE_REQUEST */ -static inline void bio_check_ro(struct bio *bio) -{ - if (op_is_write(bio_op(bio)) && bio_sectors(bio) && - bdev_read_only(bio->bi_bdev)) { - pr_warn("Trying to write to read-only block-device %pg\n", - bio->bi_bdev); - /* Older lvm-tools actually trigger this */ - } -} - static noinline int should_fail_bio(struct bio *bio) { if (should_fail_request(bdev_whole(bio->bi_bdev), bio->bi_iter.bi_size)) @@ -735,7 +725,14 @@ void submit_bio_noacct(struct bio *bio) if (should_fail_bio(bio)) goto end_io; - bio_check_ro(bio); + + if (op_is_write(bio_op(bio)) && bio_sectors(bio) && + bdev_read_only(bdev)) { + pr_warn("Trying to write to read-only block-device %pg\n", + bdev); + goto end_io; + } + if (!bio_flagged(bio, BIO_REMAPPED)) { if (unlikely(bio_check_eod(bio))) goto end_io;
Currently callers can happily submit writes to block devices that are marked read-only, including to drivers that don't even support writes and will crash when fed such bios. While bio submitter should check for read-only devices, that's not a very robust way of dealing with this. Note that the last attempt to do this got reverted by Linus in commit a32e236eb93e ("Partially revert "block: fail op_is_write() requests to read-only partitions") because device mapper relyied on not enforcing the read-only state when used together with older lvm-tools. The lvm side got fixed in: https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3 but if people still have older lvm2 tools in use we probably need to find a workaround for this in device mapper rather than lacking the core block layer checks. Signed-off-by: Christoph Hellwig <hch@lst.de> --- block/blk-core.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-)