Message ID | 20240514173900.62207-4-hare@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | enable bs > ps for block devices | expand |
On 14/05/2024 11:38, Hannes Reinecke wrote: > Bvecs can be larger than a page, and the block layer handles > this just fine. So do not split by PAGE_SIZE but rather by > the max_segment_size if that happens to be larger. Can you check scsi_debug for this series? I took this series only up to this change, and got: Startin[ 1.736470] ------------[ cut here ]------------ g Load [ 1.737777] WARNING: CPU: 0 PID: 52 at block/blk-merge.c:581 __blk_rq_map_sg+0x46a/0x480 Kernel Module fu[ 1.738862] Modules linked in: se...[ 1.739370] CPU: 0 PID: 52 Comm: kworker/0:1H Not tainted 6.9.0-00002-g4eaa50af9312-dirty #2416 [ 1.740474] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 1.741809] Workqueue: kblockd blk_mq_run_work_fn [ 1.742379] RIP: 0010:__blk_rq_map_sg+0x46a/0x480 [ 1.742939] Code: 17 fe ff ff 44 89 58 0c 48 8b 01 e9 ec fc ff ff 43 8d 3c 06 48 8b 14 24 81 ff 00 10 00 00 0f 86 af fc ff ff e9 02 f0 [ 1.743015] systemd[1]: File System Check on Root Device was skipped because of a failed condition check (ConditionPathIsReadWrite=!/. [ 1.745122] RSP: 0018:ff37636e4032bb90 EFLAGS: 00010212 [ 1.746419] systemd[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup fi. [ 1.746891] RAX: 000000000000001c RBX: 00000000000001b0 RCX: ff28e6d8b0950a00 [ 1.747903] systemd[1]: (This warning is only shown for the first unit using IP firewalling.) [ 1.748549] RDX: ff7662becb4ac482 RSI: 0000000000001000 RDI: 00000000fffffffd [ 1.749688] systemd[1]: Starting Journal Service... [ 1.749895] RBP: ff7662becb4abf80 R08: 0000000000000000 R09: ff28e6d880fadd40 [ 1.750965] R10: ff7662becb4ac480 R11: 0000000000000000 R12: 0000000000000000 [ 1.750966] R13: 0000000000000002 R14: 0000000000001000 R15: ff7662becb4ac480 [ 1.750970] FS: 0000000000000000(0000) GS:ff28e6da75c00000(0000) knlGS:0000000000000000 [ 1.750972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.750973] CR2: 00007f7407f19000 CR3: 0000000100f24002 CR4: 0000000000771ef0 [ 1.750974] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.750975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.750976] PKRU: 55555554 [ 1.750977] Call Trace: [ 1.750984] <TASK> [ 1.750986] ? __warn+0x7e/0x130 [ 1.750992] ? __blk_rq_map_sg+0x46a/0x480 [ 1.750994] ? report_bug+0x18e/0x1a0 [ 1.750999] ? handle_bug+0x3d/0x70 [ 1.751003] ? exc_invalid_op+0x18/0x70 [ 1.751006] ? asm_exc_invalid_op+0x1a/0x20 [ 1.751009] ? __blk_rq_map_sg+0x46a/0x480 [ 1.751012] scsi_alloc_sgtables+0xb7/0x3f0 [ 1.751019] sd_init_command+0x177/0x9d0 [ 1.751023] scsi_queue_rq+0x7c1/0xae0 [ 1.751027] blk_mq_dispatch_rq_list+0x2bc/0x7c0 [ 1.751031] __blk_mq_sched_dispatch_requests+0x409/0x5c0 [ 1.751035] blk_mq_sched_dispatch_requests+0x2c/0x60 [ 1.751037] blk_mq_run_work_fn+0x5f/0x70 [ 1.751039] process_one_work+0x149/0x360 I suspect that you would need to also change the PAGE_SIZE check in __blk_bios_map_sg() also. However, I am not confident that the change below is ok to begin with... BTW, scsi_debug does use an insane max_segment_size of -1 > > Signed-off-by: Hannes Reinecke <hare@kernel.org> > --- > block/blk-merge.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 4e3483a16b75..570573d7a34f 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -278,6 +278,7 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim, > struct bio_vec bv, bvprv, *bvprvp = NULL; > struct bvec_iter iter; > unsigned nsegs = 0, bytes = 0; > + unsigned bv_seg_lim = max(PAGE_SIZE, lim->max_segment_size); > > bio_for_each_bvec(bv, bio, iter) { > /* > @@ -289,7 +290,7 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim, > > if (nsegs < lim->max_segments && > bytes + bv.bv_len <= max_bytes && > - bv.bv_offset + bv.bv_len <= PAGE_SIZE) { > + bv.bv_offset + bv.bv_len <= bv_seg_lim) { > nsegs++; > bytes += bv.bv_len; > } else {
On 5/15/24 02:20, John Garry wrote: > On 14/05/2024 11:38, Hannes Reinecke wrote: >> Bvecs can be larger than a page, and the block layer handles >> this just fine. So do not split by PAGE_SIZE but rather by >> the max_segment_size if that happens to be larger. > Can you check scsi_debug for this series? I took this series only up to > this change, and got: > > Startin[ 1.736470] ------------[ cut here ]------------ > g Load [ 1.737777] WARNING: CPU: 0 PID: 52 at block/blk-merge.c:581 > __blk_rq_map_sg+0x46a/0x480 > Kernel Module fu[ 1.738862] Modules linked in: > se...[ 1.739370] CPU: 0 PID: 52 Comm: kworker/0:1H Not tainted > 6.9.0-00002-g4eaa50af9312-dirty #2416 > > [ 1.740474] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 > [ 1.741809] Workqueue: kblockd blk_mq_run_work_fn > [ 1.742379] RIP: 0010:__blk_rq_map_sg+0x46a/0x480 > [ 1.742939] Code: 17 fe ff ff 44 89 58 0c 48 8b 01 e9 ec fc ff ff 43 > 8d 3c 06 48 8b 14 24 81 ff 00 10 00 00 0f 86 af fc ff ff e9 02 f0 > [ 1.743015] systemd[1]: File System Check on Root Device was skipped > because of a failed condition check (ConditionPathIsReadWrite=!/. > [ 1.745122] RSP: 0018:ff37636e4032bb90 EFLAGS: 00010212 > [ 1.746419] systemd[1]: systemd-journald.service: unit configures an > IP firewall, but the local system does not support BPF/cgroup fi. > [ 1.746891] RAX: 000000000000001c RBX: 00000000000001b0 RCX: > ff28e6d8b0950a00 > [ 1.747903] systemd[1]: (This warning is only shown for the first > unit using IP firewalling.) > [ 1.748549] RDX: ff7662becb4ac482 RSI: 0000000000001000 RDI: > 00000000fffffffd > [ 1.749688] systemd[1]: Starting Journal Service... > [ 1.749895] RBP: ff7662becb4abf80 R08: 0000000000000000 R09: > ff28e6d880fadd40 > [ 1.750965] R10: ff7662becb4ac480 R11: 0000000000000000 R12: > 0000000000000000 > [ 1.750966] R13: 0000000000000002 R14: 0000000000001000 R15: > ff7662becb4ac480 > [ 1.750970] FS: 0000000000000000(0000) GS:ff28e6da75c00000(0000) > knlGS:0000000000000000 > [ 1.750972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.750973] CR2: 00007f7407f19000 CR3: 0000000100f24002 CR4: > 0000000000771ef0 > [ 1.750974] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 1.750975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 1.750976] PKRU: 55555554 > [ 1.750977] Call Trace: > [ 1.750984] <TASK> > [ 1.750986] ? __warn+0x7e/0x130 > [ 1.750992] ? __blk_rq_map_sg+0x46a/0x480 > [ 1.750994] ? report_bug+0x18e/0x1a0 > [ 1.750999] ? handle_bug+0x3d/0x70 > [ 1.751003] ? exc_invalid_op+0x18/0x70 > [ 1.751006] ? asm_exc_invalid_op+0x1a/0x20 > [ 1.751009] ? __blk_rq_map_sg+0x46a/0x480 > [ 1.751012] scsi_alloc_sgtables+0xb7/0x3f0 > [ 1.751019] sd_init_command+0x177/0x9d0 > [ 1.751023] scsi_queue_rq+0x7c1/0xae0 > [ 1.751027] blk_mq_dispatch_rq_list+0x2bc/0x7c0 > [ 1.751031] __blk_mq_sched_dispatch_requests+0x409/0x5c0 > [ 1.751035] blk_mq_sched_dispatch_requests+0x2c/0x60 > [ 1.751037] blk_mq_run_work_fn+0x5f/0x70 > [ 1.751039] process_one_work+0x149/0x360 > > I suspect that you would need to also change the PAGE_SIZE check in > __blk_bios_map_sg() also. However, I am not confident that the change > below is ok to begin with... > > BTW, scsi_debug does use an insane max_segment_size of -1 > Can you try with this patch? diff --git a/block/blk-merge.c b/block/blk-merge.c index 570573d7a34f..5da63180069e 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -278,7 +278,10 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim, struct bio_vec bv, bvprv, *bvprvp = NULL; struct bvec_iter iter; unsigned nsegs = 0, bytes = 0; - unsigned bv_seg_lim = max(PAGE_SIZE, lim->max_segment_size); + unsigned bv_seg_lim = PAGE_SIZE; + + if (lim->max_segment_size < UINT_MAX) + bv_seg_lim = lim->max_segment_size; bio_for_each_bvec(bv, bio, iter) { /* Cheers, Hannes
On 5/15/24 14:29, Hannes Reinecke wrote: > On 5/15/24 02:20, John Garry wrote: >> On 14/05/2024 11:38, Hannes Reinecke wrote: >>> Bvecs can be larger than a page, and the block layer handles >>> this just fine. So do not split by PAGE_SIZE but rather by >>> the max_segment_size if that happens to be larger. >> Can you check scsi_debug for this series? I took this series only up >> to this change, and got: >> >> Startin[ 1.736470] ------------[ cut here ]------------ >> g Load [ 1.737777] WARNING: CPU: 0 PID: 52 at block/blk-merge.c:581 >> __blk_rq_map_sg+0x46a/0x480 >> Kernel Module fu[ 1.738862] Modules linked in: >> se...[ 1.739370] CPU: 0 PID: 52 Comm: kworker/0:1H Not tainted >> 6.9.0-00002-g4eaa50af9312-dirty #2416 >> >> [ 1.740474] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), >> BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 >> [ 1.741809] Workqueue: kblockd blk_mq_run_work_fn >> [ 1.742379] RIP: 0010:__blk_rq_map_sg+0x46a/0x480 >> [ 1.742939] Code: 17 fe ff ff 44 89 58 0c 48 8b 01 e9 ec fc ff ff >> 43 8d 3c 06 48 8b 14 24 81 ff 00 10 00 00 0f 86 af fc ff ff e9 02 f0 >> [ 1.743015] systemd[1]: File System Check on Root Device was >> skipped because of a failed condition check (ConditionPathIsReadWrite=!/. >> [ 1.745122] RSP: 0018:ff37636e4032bb90 EFLAGS: 00010212 >> [ 1.746419] systemd[1]: systemd-journald.service: unit configures >> an IP firewall, but the local system does not support BPF/cgroup fi. >> [ 1.746891] RAX: 000000000000001c RBX: 00000000000001b0 RCX: >> ff28e6d8b0950a00 >> [ 1.747903] systemd[1]: (This warning is only shown for the first >> unit using IP firewalling.) >> [ 1.748549] RDX: ff7662becb4ac482 RSI: 0000000000001000 RDI: >> 00000000fffffffd >> [ 1.749688] systemd[1]: Starting Journal Service... >> [ 1.749895] RBP: ff7662becb4abf80 R08: 0000000000000000 R09: >> ff28e6d880fadd40 >> [ 1.750965] R10: ff7662becb4ac480 R11: 0000000000000000 R12: >> 0000000000000000 >> [ 1.750966] R13: 0000000000000002 R14: 0000000000001000 R15: >> ff7662becb4ac480 >> [ 1.750970] FS: 0000000000000000(0000) GS:ff28e6da75c00000(0000) >> knlGS:0000000000000000 >> [ 1.750972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1.750973] CR2: 00007f7407f19000 CR3: 0000000100f24002 CR4: >> 0000000000771ef0 >> [ 1.750974] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [ 1.750975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >> 0000000000000400 >> [ 1.750976] PKRU: 55555554 >> [ 1.750977] Call Trace: >> [ 1.750984] <TASK> >> [ 1.750986] ? __warn+0x7e/0x130 >> [ 1.750992] ? __blk_rq_map_sg+0x46a/0x480 >> [ 1.750994] ? report_bug+0x18e/0x1a0 >> [ 1.750999] ? handle_bug+0x3d/0x70 >> [ 1.751003] ? exc_invalid_op+0x18/0x70 >> [ 1.751006] ? asm_exc_invalid_op+0x1a/0x20 >> [ 1.751009] ? __blk_rq_map_sg+0x46a/0x480 >> [ 1.751012] scsi_alloc_sgtables+0xb7/0x3f0 >> [ 1.751019] sd_init_command+0x177/0x9d0 >> [ 1.751023] scsi_queue_rq+0x7c1/0xae0 >> [ 1.751027] blk_mq_dispatch_rq_list+0x2bc/0x7c0 >> [ 1.751031] __blk_mq_sched_dispatch_requests+0x409/0x5c0 >> [ 1.751035] blk_mq_sched_dispatch_requests+0x2c/0x60 >> [ 1.751037] blk_mq_run_work_fn+0x5f/0x70 >> [ 1.751039] process_one_work+0x149/0x360 >> >> I suspect that you would need to also change the PAGE_SIZE check in >> __blk_bios_map_sg() also. However, I am not confident that the change >> below is ok to begin with... >> >> BTW, scsi_debug does use an insane max_segment_size of -1 >> > Can you try with this patch? > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 570573d7a34f..5da63180069e 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -278,7 +278,10 @@ struct bio *bio_split_rw(struct bio *bio, const > struct queue_limits *lim, > struct bio_vec bv, bvprv, *bvprvp = NULL; > struct bvec_iter iter; > unsigned nsegs = 0, bytes = 0; > - unsigned bv_seg_lim = max(PAGE_SIZE, lim->max_segment_size); > + unsigned bv_seg_lim = PAGE_SIZE; > + > + if (lim->max_segment_size < UINT_MAX) > + bv_seg_lim = lim->max_segment_size; > > bio_for_each_bvec(bv, bio, iter) { > /* > Hmm. No, forget it. Working on another fix. Cheers, Hannes
On 15/05/2024 06:29, Hannes Reinecke wrote: >> >> I suspect that you would need to also change the PAGE_SIZE check in >> __blk_bios_map_sg() also. However, I am not confident that the change >> below is ok to begin with... >> >> BTW, scsi_debug does use an insane max_segment_size of -1 >> > Can you try with this patch? It's scsi_debug, anyone can try it. As for Luis' original issue, I did not see a proper explanation why the crash occurred. The splitting code should consider max segment size already, AFAICS. We seem to be slicing off less than LBS, which means bytes = 0 after the rounddown, which crashes. why? I think that all request_queue limits should really be double-checked for this LBS on NVMe. The virtual_boundary_mask is still 4K, which should be ok.
diff --git a/block/blk-merge.c b/block/blk-merge.c index 4e3483a16b75..570573d7a34f 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -278,6 +278,7 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim, struct bio_vec bv, bvprv, *bvprvp = NULL; struct bvec_iter iter; unsigned nsegs = 0, bytes = 0; + unsigned bv_seg_lim = max(PAGE_SIZE, lim->max_segment_size); bio_for_each_bvec(bv, bio, iter) { /* @@ -289,7 +290,7 @@ struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim, if (nsegs < lim->max_segments && bytes + bv.bv_len <= max_bytes && - bv.bv_offset + bv.bv_len <= PAGE_SIZE) { + bv.bv_offset + bv.bv_len <= bv_seg_lim) { nsegs++; bytes += bv.bv_len; } else {
Bvecs can be larger than a page, and the block layer handles this just fine. So do not split by PAGE_SIZE but rather by the max_segment_size if that happens to be larger. Signed-off-by: Hannes Reinecke <hare@kernel.org> --- block/blk-merge.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)