Message ID | 20210124100241.1167849-6-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/10] brd: remove the end of device check in brd_do_bvec | expand |
On Sun, Jan 24, 2021 at 3:05 AM Christoph Hellwig <hch@lst.de> wrote: > > There is no good reason to reassign ->bi_bdev when remapping the > partition-relative block number to the device wide one, as all the > information required by the drivers comes from the gendisk anyway. > > Keeping the original ->bi_bdev alive will allow to greatly simplify > the partition-away I/O accounting. This one causes boot failures for me on my laptop...
On Mon, Jan 25, 2021 at 10:53:36AM -0700, Jens Axboe wrote: > On Sun, Jan 24, 2021 at 3:05 AM Christoph Hellwig <hch@lst.de> wrote: > > > > There is no good reason to reassign ->bi_bdev when remapping the > > partition-relative block number to the device wide one, as all the > > information required by the drivers comes from the gendisk anyway. > > > > Keeping the original ->bi_bdev alive will allow to greatly simplify > > the partition-away I/O accounting. > > This one causes boot failures for me on my laptop... Any output to share? Details of the config?
On 1/25/21 10:57 AM, Jens Axboe wrote: > On 1/25/21 10:55 AM, Christoph Hellwig wrote: >> On Mon, Jan 25, 2021 at 10:53:36AM -0700, Jens Axboe wrote: >>> On Sun, Jan 24, 2021 at 3:05 AM Christoph Hellwig <hch@lst.de> wrote: >>>> >>>> There is no good reason to reassign ->bi_bdev when remapping the >>>> partition-relative block number to the device wide one, as all the >>>> information required by the drivers comes from the gendisk anyway. >>>> >>>> Keeping the original ->bi_bdev alive will allow to greatly simplify >>>> the partition-away I/O accounting. >>> >>> This one causes boot failures for me on my laptop... >> >> Any output to share? Details of the config? > > .config attached - hard to share any output, as it seems to load the > kernel fine, but latter parts of the boot process end up with errors > after the screen blanks. It's a laptop, so nothing fancy in terms of > logging. FWIW, it did boot on my test box. > > Let me try and do a video... Here you go, complaining about EOD access on partition 6. sudo fdisk -l /dev/nvme0n1 Disk /dev/nvme0n1: 953.89 GiB, 1024209543168 bytes, 2000409264 sectors Disk model: SAMSUNG MZVLB1T0HBLR-000L7 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 51484599-D608-465A-A3A9-D36F1D6D126E Device Start End Sectors Size Type /dev/nvme0n1p1 2048 534527 532480 260M EFI System /dev/nvme0n1p2 534528 567295 32768 16M Microsoft reserved /dev/nvme0n1p3 567296 257560575 256993280 122.6G Microsoft basic data /dev/nvme0n1p4 1998360576 2000408575 2048000 1000M Windows recovery environment /dev/nvme0n1p5 257560576 261718015 4157440 2G Linux filesystem /dev/nvme0n1p6 261718016 390625279 128907264 61.5G Linux filesystem /dev/nvme0n1p7 390625280 421875711 31250432 14.9G Linux swap /dev/nvme0n1p8 421875712 1007812607 585936896 279.4G Linux filesystem /dev/nvme0n1p9 1007812608 1398437887 390625280 186.3G Linux filesystem /dev/nvme0n1p10 1398437888 1593749503 195311616 93.1G Linux filesystem Partition table entries are not in disk order.
On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote:
> Partition table entries are not in disk order.
And the issue shows up with the series just up to the this patch,
without any later patches?
On 1/25/21 11:13 AM, Christoph Hellwig wrote: > On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote: >> Partition table entries are not in disk order. > > And the issue shows up with the series just up to the this patch, > without any later patches? At that patch specifically. I bisected it, and then I double checked by running the previous commit (boots fine), then apply this one, and then I run into that error. So it should be 100% reliable.
On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote: > On 1/25/21 11:13 AM, Christoph Hellwig wrote: > > On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote: > >> Partition table entries are not in disk order. > > > > And the issue shows up with the series just up to the this patch, > > without any later patches? > > At that patch specifically. I bisected it, and then I double checked > by running the previous commit (boots fine), then apply this one, and > then I run into that error. So it should be 100% reliable. Ok, I have an idea. With EOD message you mean this printk, right: pr_info_ratelimited("attempt to access beyond end of device\n" "%s: rw=%d, want=%llu, limit=%llu\n", ... right?
On 1/25/21 11:18 AM, Christoph Hellwig wrote: > On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote: >> On 1/25/21 11:13 AM, Christoph Hellwig wrote: >>> On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote: >>>> Partition table entries are not in disk order. >>> >>> And the issue shows up with the series just up to the this patch, >>> without any later patches? >> >> At that patch specifically. I bisected it, and then I double checked >> by running the previous commit (boots fine), then apply this one, and >> then I run into that error. So it should be 100% reliable. > > Ok, I have an idea. With EOD message you mean this printk, right: > > pr_info_ratelimited("attempt to access beyond end of device\n" > "%s: rw=%d, want=%llu, limit=%llu\n", > ... > > right? Yep
On Mon, Jan 25, 2021 at 11:19:23AM -0700, Jens Axboe wrote: > On 1/25/21 11:18 AM, Christoph Hellwig wrote: > > On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote: > >> On 1/25/21 11:13 AM, Christoph Hellwig wrote: > >>> On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote: > >>>> Partition table entries are not in disk order. > >>> > >>> And the issue shows up with the series just up to the this patch, > >>> without any later patches? > >> > >> At that patch specifically. I bisected it, and then I double checked > >> by running the previous commit (boots fine), then apply this one, and > >> then I run into that error. So it should be 100% reliable. > > > > Ok, I have an idea. With EOD message you mean this printk, right: > > > > pr_info_ratelimited("attempt to access beyond end of device\n" > > "%s: rw=%d, want=%llu, limit=%llu\n", > > ... > > > > right? > > Yep Can you give this untested patch a spin? This should fix the case where we check the eod for the original partition with the remapped bi_sectors. Looking into a local reproducer now. diff --git a/block/blk-core.c b/block/blk-core.c index 88f60890443264..6253a2f9a1c08f 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -813,13 +813,16 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) if (should_fail_bio(bio)) goto end_io; + if (unlikely(bio_check_ro(bio))) goto end_io; - if (unlikely(bio_check_eod(bio))) - goto end_io; - if (bio->bi_bdev->bd_partno && !bio_flagged(bio, BIO_REMAPPED) && - unlikely(blk_partition_remap(bio))) - goto end_io; + if (!bio_flagged(bio, BIO_REMAPPED)) { + if (unlikely(bio_check_eod(bio))) + goto end_io; + if (bio->bi_bdev->bd_partno && + unlikely(blk_partition_remap(bio))) + goto end_io; + } /* * Filter flush bio's early so that bio based drivers without flush
On 1/25/21 11:21 AM, Christoph Hellwig wrote: > On Mon, Jan 25, 2021 at 11:19:23AM -0700, Jens Axboe wrote: >> On 1/25/21 11:18 AM, Christoph Hellwig wrote: >>> On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote: >>>> On 1/25/21 11:13 AM, Christoph Hellwig wrote: >>>>> On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote: >>>>>> Partition table entries are not in disk order. >>>>> >>>>> And the issue shows up with the series just up to the this patch, >>>>> without any later patches? >>>> >>>> At that patch specifically. I bisected it, and then I double checked >>>> by running the previous commit (boots fine), then apply this one, and >>>> then I run into that error. So it should be 100% reliable. >>> >>> Ok, I have an idea. With EOD message you mean this printk, right: >>> >>> pr_info_ratelimited("attempt to access beyond end of device\n" >>> "%s: rw=%d, want=%llu, limit=%llu\n", >>> ... >>> >>> right? >> >> Yep > > Can you give this untested patch a spin? This should fix the > case where we check the eod for the original partition with the > remapped bi_sectors. Looking into a local reproducer now. Yep, with that applied on top my laptop boots again.
diff --git a/block/blk-core.c b/block/blk-core.c index 64f69022de9627..1c1b97a82caa2e 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -752,7 +752,7 @@ static int blk_partition_remap(struct bio *bio) bio->bi_iter.bi_sector - p->bd_start_sect); } - bio->bi_bdev = bdev_whole(p); + bio_set_flag(bio, BIO_REMAPPED); return 0; } @@ -817,7 +817,8 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio) goto end_io; if (unlikely(bio_check_eod(bio))) goto end_io; - if (bio->bi_bdev->bd_partno && unlikely(blk_partition_remap(bio))) + if (bio->bi_bdev->bd_partno && !bio_flagged(bio, BIO_REMAPPED) && + unlikely(blk_partition_remap(bio))) goto end_io; /* diff --git a/include/linux/bio.h b/include/linux/bio.h index 12af7aa5db3778..2f1155eabaff29 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -485,6 +485,7 @@ extern const char *bio_devname(struct bio *bio, char *buffer); #define bio_set_dev(bio, bdev) \ do { \ + bio_clear_flag(bio, BIO_REMAPPED); \ if ((bio)->bi_bdev != (bdev)) \ bio_clear_flag(bio, BIO_THROTTLED); \ (bio)->bi_bdev = (bdev); \ @@ -493,6 +494,7 @@ do { \ #define bio_copy_dev(dst, src) \ do { \ + bio_clear_flag(dst, BIO_REMAPPED); \ (dst)->bi_bdev = (src)->bi_bdev; \ bio_clone_blkg_association(dst, src); \ } while (0) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 8ebd8be3e05082..1bc6f6a01070fc 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -303,6 +303,7 @@ enum { * of this bio. */ BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ BIO_TRACKED, /* set if bio goes through the rq_qos path */ + BIO_REMAPPED, BIO_FLAG_LAST };