diff mbox series

[05/10] block: do not reassig ->bi_bdev when partition remapping

Message ID 20210124100241.1167849-6-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/10] brd: remove the end of device check in brd_do_bvec | expand

Commit Message

Christoph Hellwig Jan. 24, 2021, 10:02 a.m. UTC
There is no good reason to reassign ->bi_bdev when remapping the
partition-relative block number to the device wide one, as all the
information required by the drivers comes from the gendisk anyway.

Keeping the original ->bi_bdev alive will allow to greatly simplify
the partition-away I/O accounting.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-core.c          | 5 +++--
 include/linux/bio.h       | 2 ++
 include/linux/blk_types.h | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

Comments

Jens Axboe Jan. 25, 2021, 5:53 p.m. UTC | #1
On Sun, Jan 24, 2021 at 3:05 AM Christoph Hellwig <hch@lst.de> wrote:
>
> There is no good reason to reassign ->bi_bdev when remapping the
> partition-relative block number to the device wide one, as all the
> information required by the drivers comes from the gendisk anyway.
>
> Keeping the original ->bi_bdev alive will allow to greatly simplify
> the partition-away I/O accounting.

This one causes boot failures for me on my laptop...
Christoph Hellwig Jan. 25, 2021, 5:55 p.m. UTC | #2
On Mon, Jan 25, 2021 at 10:53:36AM -0700, Jens Axboe wrote:
> On Sun, Jan 24, 2021 at 3:05 AM Christoph Hellwig <hch@lst.de> wrote:
> >
> > There is no good reason to reassign ->bi_bdev when remapping the
> > partition-relative block number to the device wide one, as all the
> > information required by the drivers comes from the gendisk anyway.
> >
> > Keeping the original ->bi_bdev alive will allow to greatly simplify
> > the partition-away I/O accounting.
> 
> This one causes boot failures for me on my laptop...

Any output to share?  Details of the config?
Jens Axboe Jan. 25, 2021, 6:03 p.m. UTC | #3
On 1/25/21 10:57 AM, Jens Axboe wrote:
> On 1/25/21 10:55 AM, Christoph Hellwig wrote:
>> On Mon, Jan 25, 2021 at 10:53:36AM -0700, Jens Axboe wrote:
>>> On Sun, Jan 24, 2021 at 3:05 AM Christoph Hellwig <hch@lst.de> wrote:
>>>>
>>>> There is no good reason to reassign ->bi_bdev when remapping the
>>>> partition-relative block number to the device wide one, as all the
>>>> information required by the drivers comes from the gendisk anyway.
>>>>
>>>> Keeping the original ->bi_bdev alive will allow to greatly simplify
>>>> the partition-away I/O accounting.
>>>
>>> This one causes boot failures for me on my laptop...
>>
>> Any output to share?  Details of the config?
> 
> .config attached - hard to share any output, as it seems to load the
> kernel fine, but latter parts of the boot process end up with errors
> after the screen blanks. It's a laptop, so nothing fancy in terms of
> logging. FWIW, it did boot on my test box.
> 
> Let me try and do a video...

Here you go, complaining about EOD access on partition 6.

sudo fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 953.89 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: SAMSUNG MZVLB1T0HBLR-000L7              
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 51484599-D608-465A-A3A9-D36F1D6D126E

Device               Start        End   Sectors   Size Type
/dev/nvme0n1p1        2048     534527    532480   260M EFI System
/dev/nvme0n1p2      534528     567295     32768    16M Microsoft reserved
/dev/nvme0n1p3      567296  257560575 256993280 122.6G Microsoft basic data
/dev/nvme0n1p4  1998360576 2000408575   2048000  1000M Windows recovery environment
/dev/nvme0n1p5   257560576  261718015   4157440     2G Linux filesystem
/dev/nvme0n1p6   261718016  390625279 128907264  61.5G Linux filesystem
/dev/nvme0n1p7   390625280  421875711  31250432  14.9G Linux swap
/dev/nvme0n1p8   421875712 1007812607 585936896 279.4G Linux filesystem
/dev/nvme0n1p9  1007812608 1398437887 390625280 186.3G Linux filesystem
/dev/nvme0n1p10 1398437888 1593749503 195311616  93.1G Linux filesystem

Partition table entries are not in disk order.
Christoph Hellwig Jan. 25, 2021, 6:13 p.m. UTC | #4
On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote:
> Partition table entries are not in disk order.

And the issue shows up with the series just up to the this patch,
without any later patches?
Jens Axboe Jan. 25, 2021, 6:15 p.m. UTC | #5
On 1/25/21 11:13 AM, Christoph Hellwig wrote:
> On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote:
>> Partition table entries are not in disk order.
> 
> And the issue shows up with the series just up to the this patch,
> without any later patches?

At that patch specifically. I bisected it, and then I double checked
by running the previous commit (boots fine), then apply this one, and
then I run into that error. So it should be 100% reliable.
Christoph Hellwig Jan. 25, 2021, 6:18 p.m. UTC | #6
On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote:
> On 1/25/21 11:13 AM, Christoph Hellwig wrote:
> > On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote:
> >> Partition table entries are not in disk order.
> > 
> > And the issue shows up with the series just up to the this patch,
> > without any later patches?
> 
> At that patch specifically. I bisected it, and then I double checked
> by running the previous commit (boots fine), then apply this one, and
> then I run into that error. So it should be 100% reliable.

Ok, I have an idea.  With EOD message you mean this printk, right:

	pr_info_ratelimited("attempt to access beyond end of device\n"
                            "%s: rw=%d, want=%llu, limit=%llu\n",
			    ...

right?
Jens Axboe Jan. 25, 2021, 6:19 p.m. UTC | #7
On 1/25/21 11:18 AM, Christoph Hellwig wrote:
> On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote:
>> On 1/25/21 11:13 AM, Christoph Hellwig wrote:
>>> On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote:
>>>> Partition table entries are not in disk order.
>>>
>>> And the issue shows up with the series just up to the this patch,
>>> without any later patches?
>>
>> At that patch specifically. I bisected it, and then I double checked
>> by running the previous commit (boots fine), then apply this one, and
>> then I run into that error. So it should be 100% reliable.
> 
> Ok, I have an idea.  With EOD message you mean this printk, right:
> 
> 	pr_info_ratelimited("attempt to access beyond end of device\n"
>                             "%s: rw=%d, want=%llu, limit=%llu\n",
> 			    ...
> 
> right?

Yep
Christoph Hellwig Jan. 25, 2021, 6:21 p.m. UTC | #8
On Mon, Jan 25, 2021 at 11:19:23AM -0700, Jens Axboe wrote:
> On 1/25/21 11:18 AM, Christoph Hellwig wrote:
> > On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote:
> >> On 1/25/21 11:13 AM, Christoph Hellwig wrote:
> >>> On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote:
> >>>> Partition table entries are not in disk order.
> >>>
> >>> And the issue shows up with the series just up to the this patch,
> >>> without any later patches?
> >>
> >> At that patch specifically. I bisected it, and then I double checked
> >> by running the previous commit (boots fine), then apply this one, and
> >> then I run into that error. So it should be 100% reliable.
> > 
> > Ok, I have an idea.  With EOD message you mean this printk, right:
> > 
> > 	pr_info_ratelimited("attempt to access beyond end of device\n"
> >                             "%s: rw=%d, want=%llu, limit=%llu\n",
> > 			    ...
> > 
> > right?
> 
> Yep

Can you give this untested patch a spin?  This should fix the
case where we check the eod for the original partition with the
remapped bi_sectors.  Looking into a local reproducer now.

diff --git a/block/blk-core.c b/block/blk-core.c
index 88f60890443264..6253a2f9a1c08f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -813,13 +813,16 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 
 	if (should_fail_bio(bio))
 		goto end_io;
+
 	if (unlikely(bio_check_ro(bio)))
 		goto end_io;
-	if (unlikely(bio_check_eod(bio)))
-		goto end_io;
-	if (bio->bi_bdev->bd_partno && !bio_flagged(bio, BIO_REMAPPED) &&
-	    unlikely(blk_partition_remap(bio)))
-		goto end_io;
+	if (!bio_flagged(bio, BIO_REMAPPED)) {
+		if (unlikely(bio_check_eod(bio)))
+			goto end_io;
+		if (bio->bi_bdev->bd_partno &&
+		    unlikely(blk_partition_remap(bio)))
+			goto end_io;
+	}
 
 	/*
 	 * Filter flush bio's early so that bio based drivers without flush
Jens Axboe Jan. 25, 2021, 6:31 p.m. UTC | #9
On 1/25/21 11:21 AM, Christoph Hellwig wrote:
> On Mon, Jan 25, 2021 at 11:19:23AM -0700, Jens Axboe wrote:
>> On 1/25/21 11:18 AM, Christoph Hellwig wrote:
>>> On Mon, Jan 25, 2021 at 11:15:04AM -0700, Jens Axboe wrote:
>>>> On 1/25/21 11:13 AM, Christoph Hellwig wrote:
>>>>> On Mon, Jan 25, 2021 at 11:03:24AM -0700, Jens Axboe wrote:
>>>>>> Partition table entries are not in disk order.
>>>>>
>>>>> And the issue shows up with the series just up to the this patch,
>>>>> without any later patches?
>>>>
>>>> At that patch specifically. I bisected it, and then I double checked
>>>> by running the previous commit (boots fine), then apply this one, and
>>>> then I run into that error. So it should be 100% reliable.
>>>
>>> Ok, I have an idea.  With EOD message you mean this printk, right:
>>>
>>> 	pr_info_ratelimited("attempt to access beyond end of device\n"
>>>                             "%s: rw=%d, want=%llu, limit=%llu\n",
>>> 			    ...
>>>
>>> right?
>>
>> Yep
> 
> Can you give this untested patch a spin?  This should fix the
> case where we check the eod for the original partition with the
> remapped bi_sectors.  Looking into a local reproducer now.

Yep, with that applied on top my laptop boots again.
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 64f69022de9627..1c1b97a82caa2e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -752,7 +752,7 @@  static int blk_partition_remap(struct bio *bio)
 				      bio->bi_iter.bi_sector -
 				      p->bd_start_sect);
 	}
-	bio->bi_bdev = bdev_whole(p);
+	bio_set_flag(bio, BIO_REMAPPED);
 	return 0;
 }
 
@@ -817,7 +817,8 @@  static noinline_for_stack bool submit_bio_checks(struct bio *bio)
 		goto end_io;
 	if (unlikely(bio_check_eod(bio)))
 		goto end_io;
-	if (bio->bi_bdev->bd_partno && unlikely(blk_partition_remap(bio)))
+	if (bio->bi_bdev->bd_partno && !bio_flagged(bio, BIO_REMAPPED) &&
+	    unlikely(blk_partition_remap(bio)))
 		goto end_io;
 
 	/*
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 12af7aa5db3778..2f1155eabaff29 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -485,6 +485,7 @@  extern const char *bio_devname(struct bio *bio, char *buffer);
 
 #define bio_set_dev(bio, bdev) 				\
 do {							\
+	bio_clear_flag(bio, BIO_REMAPPED);		\
 	if ((bio)->bi_bdev != (bdev))			\
 		bio_clear_flag(bio, BIO_THROTTLED);	\
 	(bio)->bi_bdev = (bdev);			\
@@ -493,6 +494,7 @@  do {							\
 
 #define bio_copy_dev(dst, src)			\
 do {						\
+	bio_clear_flag(dst, BIO_REMAPPED);		\
 	(dst)->bi_bdev = (src)->bi_bdev;	\
 	bio_clone_blkg_association(dst, src);	\
 } while (0)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 8ebd8be3e05082..1bc6f6a01070fc 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -303,6 +303,7 @@  enum {
 				 * of this bio. */
 	BIO_CGROUP_ACCT,	/* has been accounted to a cgroup */
 	BIO_TRACKED,		/* set if bio goes through the rq_qos path */
+	BIO_REMAPPED,
 	BIO_FLAG_LAST
 };